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Abstract 

We  introduce  a  rigorous  mathematical  theory  for  the  analysis  of  local  histograms,  and 
study  how  they  interact  with  textures  that  can  be  modeled  as  occlusions  of  simpler 
components.  We  first  show  how  local  histograms  can  be  computed  as  a  system  of 
convolutions  and  discuss  some  basic  local  histogram  properties.  We  then  introduce  a 
probabilistic,  occlusion-based  model  for  textures  and  formally  demonstrate  that  local 
histogram  transforms  are  natural  tools  for  analyzing  the  textures  produced  by  our 
model.  Next,  we  characterize  all  nonlinear  transforms  which  satisfy  the  three  key 
properties  of  local  histograms  and  consider  the  appropriateness  of  local  histogram 
features  in  the  automated  classification  of  textures  commonly  encountered  in  histo¬ 
logical  images.  When  classifying  tissues,  pathologists  indicate  they  focus  on  simple, 
locally-defined  features  that  essentially  involve  pixel  counting,  such  as  the  number  of 
cells  in  a  region  of  given  size,  the  size  of  the  nuclei  within  these  cells,  and  the  distri¬ 
bution  of  color  within  both.  We  discuss  how  local  histogram  transforms  can  be  used 
to  produce  numerical  features  that,  when  fed  into  mainstream  classification  schemes, 
mimic  the  baser  aspects  of  a  pathologist’s  thought  process. 
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LOCAL  HISTOGRAMS  FOR  PER-PIXEL  CLASSIFICATION 


I.  Introduction 

Local  histograms  are  a  well-studied  signal  processing  tool  [6,  7,  9-19,  22-24,  26- 
28,  32,  34-36].  They  permit  one  to  count  the  frequency  of  occurrence  of  a  given  feature 
over  a  small  neighborhood  of  any  given  pixel.  They  have  great  potential  in  image 
classification,  as  they  mimic  certain  visual  cues  that  people  use  when  hand-segmenting 
an  image.  In  particular,  rather  than  considering  pixels  in  isolation,  they  account  for 
the  distribution  of  colors  within  the  neighborhood  of  a  given  location  [26,  27].  The 
focus  of  this  research  is  the  development  of  new  mathematical  theory,  specifically 
that  based  upon  local  color  histograms,  to  be  used  in  the  development  of  image 
classification  systems,  such  as  those  considered  in  [2,  3]. 

Our  focus  is  a  rigorous  mathematical  analysis  of  local  histograms.  This  theory 
arose  during  the  development  of  an  automatic  classification  scheme  for  histology  (Fig¬ 
ure  1(a))  [2,  3,  5,  26-28].  There,  we  considered  the  appropriateness  of  using  local  his¬ 
tograms  as  features  in  the  automated  classification  of  textures  commonly  encountered 
in  images  of  hematoxylin  and  eosin  (H&E)  stained  tissues.  Even  when  a  human  user 
can  easily  delineate  and  identify  the  different  tissues  present  in  such  multiple-tissue 
images  (Figure  1(a)),  segmenting  them  by  hand  (Figure  1(b))  can  be  time-consuming 
and  error-prone.  As  such,  there  is  a  pressing  need  to  automate  as  much  of  this  process 
as  possible. 

One  of  the  main  contributions  of  this  dissertation  is  the  realization  that  local 
histograms — though  difficult  to  analyze  in  general — can  be  well- understood  in  the 
special  case  where  the  images  are  generated  according  to  random  processes.  Simple 
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(a)  (b) 


Figure  1.  (a)  A  digital  microscope  image  of  a  H&E-stained  tissue  section,  (b)  The  histology  image 
has  been  manually  segmented  and  classified  by  a  medical  expert,  resulting  in  the  per-pixel  labels. 
From  darkest  to  lightest,  the  labels  indicate  cartilage,  pseudovascular  tissue,  connective  tissue,  bone, 
fatty  tissue,  and  background  pixels,  respectively.  Our  goal  is  to  automate  this  segmentation-and- 
classification  process.  One  of  the  purposes  of  this  dissertation  is  to  provide  a  theoretical  justification 
for  using  local  histograms  to  achieve  this  goal. 


examples  of  the  types  of  random  images  we  consider  are  given  in  Figure  2(a),  (b),  and 
(c),  denoted  fa,  fb  and  fc,  respectively.  Each  of  these  128  x  128  images  is  generated 
by  the  following  process:  at  each  pixel  location,  a  coin  is  flipped,  resulting  in  either 
a  white  pixel  value  (heads)  or  a  black  pixel  value  (tails).  The  average  intensity  of 
each  image  depends  on  the  fairness  of  the  coin  used  to  generate  it:  letting  p  denote 
the  probability  of  getting  heads,  /„,  fb  and  fc  were  generated  with  p  being  |,  \ 
and  |,  respectively.  By  compositing  /a,  fb  and  fc  according  to  the  values  given  in 
Figure  2(d)  we  obtain  Figure  2(e).  Though  one  can  easily  synthesize  Figure  2(e) 
from  Figure  2(a),  (b),  (c),  and  (d),  the  real- world  goal  is  to  do  the  opposite:  given  an 
image  like  Figure  2(e),  we  want  to  find  the  spatial  relationships  Figure  2(d)  between 
distinct  texture  types  Figure  2(a),  (b),  and  (c). 

The  reason  we  care  about  randomly  generated  synthetic  images  like  Figure  2(e) 
is  because  they  seem  to  be  a  good  model  for  the  types  of  images  that  often  arise 
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Figure  2.  Three  128  x  128  black  and  white  images  (a),  (b)  and  (c)  that  were  randomly  generated  by  a 
sequence  of  Bernoulli  random  variables — coin  flips — with  p  =  \  and  | ,  respectively.  Compositing 

these  three  images  together  according  to  (d)  produces  an  image  (e)  whose  image  statistics  vary  by 
location.  Such  images  model  those  commonly  found  in  digital  microscopy.  We  want  an  algorithm 
to  segment  and  classify  such  images.  That  is,  given  (e)  and  knowing  the  image  statistics  of  each 
individual  texture  (a),  (b)  and  (c),  we  want  to  produce  a  good  approximation  of  (d).  The  theory 
presented  in  this  dissertation  provides  theoretical  justification  for  one  such  algorithm:  assign  a  label 
to  any  given  pixel  by  comparing  its  local  histogram  to  the  three  distinct  distributions  of  pixels  found 
in  (a),  (b)  and  (c). 


in  digital  microscopy  of  biological  tissues  [1,  3,  28].  Indeed,  one  way  to  think  about 
such  real-world  images  is  that  they  are  a  mosaic  of  several  distinct  textures,  each  one 
being  a  relatively  uniform  composite  of  more  basic  texture  types.  For  example,  the 
histology  image  in  Figure  1(a) — thumbnailed  in  Figure  3(a) — is  comprised  of  more 
basic  textures  such  as  cartilage  (Figure  3(b)),  connective  tissue  (Figure  3(c)),  and 
pseudovascular  tissue  (Figure  3(d)).  Moreover,  each  tissue  itself  is  comprised  of  more 
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basic  components,  such  as  the  particular  combination  of  dark  red,  dark  purple,  and 
light  pink  of  typical  pseudovascular  tissue  versus  the  dark  and  light  purple  of  cartilage. 

To  summarize,  our  goal  is  to  segment  and  classify  real-world  images  such  as  Fig¬ 
ure  1(a)  according  to  texture  (tissue)  type;  in  order  to  have  a  mathematically  rigor¬ 
ous  analysis  of  this  problem,  we  model  these  images  as  being  the  result  of  random 
processes  (Figure  2(e)).  In  particular,  we  focus  on  a  rigorous  analysis  of  how  local 
histograms  can  be  used  to  solve  this  problem.  We  favor  histograms  since  they  are  a 
natural  way  of  estimating  the  distribution  of  a  given  texture’s  pixel  values;  as  seen  in 
Figure  3(f),  (g),  and  (h),  such  histograms  are  a  key  discriminating  feature  between 
distinct  texture  types.  It  is  important,  however,  that  these  histograms  are  computed 
locally — over  a  fixed  neighborhood  of  every  pixel  location — since  global  histograms, 
such  as  the  one  depicted  in  Figure  3(e)  derived  from  Figure  3(a),  destroy  spatial  con¬ 
text  by  mixing  all  of  the  individual  distributions  together.  A  similar  issue  arises  in 
time-frequency  analysis:  spectrograms  preserve  spatial  context  while  Fourier  trans¬ 
forms  do  not.  Indeed,  local  histograms  are  similar  to  spectrograms:  in  a  neighborhood 
of  a  given  point,  the  local  histogram  transform  estimates  the  frequency  of  occurrence 
of  a  given  value  while  the  spectrogram  estimates  frequency  in  the  traditional  sense. 

All  of  our  main  results  are  based  on  the  following  idea:  often  the  local  histograms 
of  a  texture  are,  in  fact,  a  combination  of  local  histograms  of  more  basic  textures.  For 
example,  the  local  histograms  of  pseudovascular  tissue  (Figure  3(d))  are  combinations 
of  three  simpler  distributions — one  pink,  another  purple  and  a  third  reddish-pink — 
while  those  of  connective  tissue  (Figure  3(c))  are  mixtures  of  only  the  first  two.  We 
formally  prove  that  this  intuition  is  valid,  provided  we  work  with  randomly  generated 
textures.  This  paves  the  way  for  a  segmentation  and  classification  scheme  that  assigns 
a  texture  label  to  any  given  pixel  according  to  how  its  local  histogram  compares 
against  those  learned  from  training  data. 
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(a)  Histology  image  (b)  Cartilage  (c)  Connective  (d)  Pseudovascular 


(e)  The  red-blue  his-  (f)  The  red-blue  his-  (g)  The  red-blue  his-  (h)  The  red-blue  his¬ 
togram  of  (a).  togram  of  (b).  togram  of  (c).  togram  of  (d). 


Figure  3.  A  1200  x  1200  histology  image  exhibiting  multiple  tissue  types  (a)  and  the  256  x  256 
histogram  of  its  red-blue  (RB)  pixel  values  (e).  As  is  common  with  H&E  staining,  the  tissues  in  (a) 
are  purple-pink  and  so  we  ignore  the  green  component  of  these  red-green-blue  (RGB)  images  when 
computing  (e).  This  histogram  is  viewed  from  above,  with  red  and  blue  ranging  from  0  to  255  on  the 
horizontal  and  vertical  axes,  respectively;  here  the  height  of  the  histogram  is  proportional  to  darkness 
for  the  sake  of  readability.  In  (b),  (c)  and  (d)  we  zoom  in  on  three  128  x  128  patches  extracted  from 
(a),  each  of  which  exhibit  a  single  tissue  type,  namely  cartilage,  connective  tissue  and  pseudovascular 
tissue,  respectively.  Each  of  these  three  tissue  types  has  a  distinct  distribution  of  pixel  values,  as 
evidenced  by  their  corresponding  RB  histograms  (f),  (g)  and  (h).  These  256  x  256  histograms  are 
similar  to  (e),  but  are  only  computed  over  those  points  of  a  given  type  according  to  the  ground  truth 
labels  in  Figure  1(b).  In  particular,  the  histogram  (f)  of  cartilage  (b)  is  computed  over  all  points 
labeled  in  black  in  Figure  1(b).  We  see  that  cartilage  is  darker,  on  average,  than  connective:  (f)  is 
distributed  more  towards  the  lower  left-hand  side  than  (g)  is.  Moreover,  pseudovascular  is  similar  to 
connective,  but  possesses  additional  reddish-pink  structures,  as  evidenced  by  the  subdiagonal  blob 
found  in  (h),  but  not  (g).  As  such,  it  is  plausible  that  local  histograms  can  serve  as  discriminating 
features  in  segmentation-and-classification  algorithms. 
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1.1  Contributions 


In  this  section,  we  highlight  the  main  results  of  this  dissertation  and  discuss  the 
publications  that  have  arisen  from  this  research.  Our  research  focuses  on  provid¬ 
ing  a  rigorous  mathematical  analysis  of  local  histograms.  We  first  show  that  they 
can  be  computed  as  a  system  of  convolutions  (Theorem  1).  In  Theorem  3,  we  find 
a  relationship  between  local  histograms  and  images  that  are  composites  of  several 
distinct  textures:  the  local  histograms  of  an  image  over  a  given  neighborhood  are 
approximately,  on  average,  a  combination  of  the  local  histograms  of  the  tissues  that 
comprise  the  image  in  that  neighborhood.  In  Theorem  4,  we  show  that,  under  cer¬ 
tain  hypotheses,  this  approximation  becomes  perfect,  on  average.  Various  methods 
for  constructing  probabilistic  image  models  for  which  these  hypotheses  hold  are  given 
in  Theorems  6,  7,  and  8.  These  results  and  their  proofs,  along  with  a  proof-of-concept 
segmentation-and-classification  algorithm,  have  been  submitted  in  the  journal  article 
“Local  Histograms  and  Image  Occlusion  Models”  [27]. 

Having  found  some  nice  relationships  between  local  histograms  and  composite 
images,  we  then  characterize  all  (possibly  nonlinear)  transforms  which  satisfy  the 
three  key  properties  of  local  histograms.  In  Theorem  10,  we  formally  show  that  these 
properties  are  almost  unique  to  local  histograms.  In  particular,  we  show  that  local 
histograms  are  essentially  the  only  transforms  that  distribute  over  random  compos¬ 
ites,  on  average.  Having  this  understanding  of  the  average  behavior  of  such  local 
histograms,  we  then  turn  to  the  subject  of  how  closely  this  average  approximates  a 
typical  local  histogram.  That  is,  we  compute  the  variance  of  the  local  histograms  of 
a  composite  image  (Theorem  11),  showing  that  the  distance  of  our  local  histograms 
from  the  average  is  highly  dependent  on  the  scale  of  the  local  histogram  window.  To 
compute  this  variance,  we  necessarily  make  several  simplifying  assumptions  on  the 
nature  of  the  random  images  in  question;  in  Theorem  13,  we  provide  a  method  for 
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constructing  image  models  which  demonstrates  that  these  assumptions  are  indeed 
realistic.  These  results  and  their  proofs  are  presented  in  the  journal  article  “Local 
Histograms  and  Texture  Classification’'  [29]  which  is  in  its  final  stages  of  preparation. 

In  addition  to  the  two  main  journal  articles  mentioned  above,  the  main  results  of 
this  PhD  research  are  also  included  in  two  conference  proceedings.  The  conference 
proceeding  “Local  Histograms  for  Classifying  H&E  Stained  Tissues”  [26]  introduces  a 
rigorous  mathematical  theory  for  the  analysis  of  local  histograms  and  discusses  its  use¬ 
fulness  in  the  automated  classification  of  textures  commonly  encountered  in  histologi¬ 
cal  images.  The  conference  proceeding  “A  Domain-Knowledge-Inspired  Mathematical 
Framework  for  the  Description  and  Classification  of  H&E  Stained  Histopathology  Im¬ 
ages”  [28]  presents  an  overview  of  our  mathematical  framework  for  the  segmentation 
and  classification  of  histology  images. 

In  addition  to  the  major  contributions  arising  from  this  work  in  image  processing, 
the  journal  article  “Fast  Computation  of  Spectral  Centroids,”  [30]  which  represents  a 
refinement  of  the  MS  research  [25],  has  been  published  in  Advances  in  Computational 
Mathematics.  Finally,  the  book  chapter  “Finite  Frames  and  Filter  Banks”  is  to  appear 
in  Finite  Frames:  Theory  and  Application. 
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The  following  is  a  list  of  current  publications  that  have  arisen  during  this  research: 

Journal  Articles 

1.  Massar,  Melody  L.,  Matthew  Fickus,  Erik  Bryan,  Douglas  T.  Petkie,  and  An¬ 
drew  J.  Terzuoli,  Jr.  “Fast  Computation  of  Spectral  Centroids,”  Advances  in 
Computational  Mathematics ,  35:  83-97  (2011). 

2.  Massar,  Melody  L.,  Ramamurthy  Bhagavatula,  Matthew  Fickus,  and  Jelcna 
Kovacevic.  “Local  Histograms  and  Image  Occlusion  Models,”  submitted  to: 
Applied  and  Computational  Harmonic  Analysis,  arXiv:1105.4069,  21  pages. 

3.  Massar,  Melody  L.  and  Matthew  Fickus.  “Local  Histograms  and  Texture  Clas¬ 
sification,”  in  preparation,  to  be  submitted  to:  Journal  of  Fourier  Analysis  and 
Applications. 

Refereed  Book  Chapters 

5.  Fickus,  Matthew,  Melody  L.  Massar,  and  Dustin  G.  Mixon.  “Finite  Frames 
and  Filter  Banks,”  to  appear  in:  Finite  Frames:  Theory  and  Application,  Peter 
G.  Casazza  and  Gitta  Kutyniok  eds.,  Birkhauser,  40  pages. 

Conference  Proceedings 

6.  Massar,  Melody  L.,  Ramamurthy  Bhagavatula,  Matthew  Fickus,  and  Jelcna 
Kovacevic.  “Local  Histograms  for  Classifying  H&E  Stained  Tissues,”  Proceed¬ 
ings  of  the  26th  Southern  Biomedical  Engineering  Conference,  348-352  (2010). 

7.  Massar,  Melody  L.,  Ramamurthy  Bhagavatula,  John  A.  Ozolck,  Carlos  A.  Cas¬ 
tro,  Matthew  Fickus,  Jelcna  Kovacevic.  “A  Domain-Knowledge-Inspired  Math¬ 
ematical  Framework  for  the  Description  and  Classification  of  H&E  Stained 
Histopathology  Images,”  Proceedings  of  SPIE,  8138:  81380U / 1—7  (2011). 


1.2  Outline 


We  begin  in  Chapter  II  with  a  short  review  of  the  background  literature  of  local 
histograms,  occlusions,  and  histology  image  classification.  In  Chapter  III  we  discuss 
local  histograms:  their  basic  properties,  how  to  efficiently  compute  them  using  con¬ 
volutions,  and  how  they  interact  with  occlusions.  In  Chapter  IV,  we  characterize 
all  nonlinear  transforms  which  satisfy  the  three  key  properties  of  local  histograms. 
In  Chapter  V,  we  then  extend  our  theory  by  computing  the  variance  of  the  local 
histogram  transform.  In  the  final  chapter,  we  present  a  preliminary  segmentation- 
and-classification  algorithm  in  which  local  histograms  are  decomposed  using  principal 
component  analysis  (PCA). 
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II.  Literature  Review 


In  this  chapter,  we  discuss  the  literature  relevant  to  our  research.  We  begin 
our  review  in  Section  2.1  with  a  discussion  of  the  background  literature  of  local 
histograms.  Local  histograms  are  used  in  image  processing  to  compute  the  number 
of  times  a  given  feature  occurs  within  a  neighborhood  of  a  given  pixel  [6,  7,  9- 
19,  22-24,  26-28,  32,  34-36].  In  particular,  local  color  histograms  [12,  13,  18,  19, 
26-28,  32,  35,  36]  ,  local  orientation  histograms  [6,  7,  9-11,  15],  and  local  spectral 
histograms  [22-24]  allow  us  to  compute  the  number  of  times  a  given  pixel  intensity, 
edge  orientation,  or  spatial  frequency,  respectively,  occur  within  a  neighborhood  of 
a  given  location.  Background  literature  of  each  of  these  three  types  of  histogram 
transforms  are  discussed  in  detail  below. 

In  Section  2.2,  we  discuss  the  background  literature  of  occlusion  models  [4,  20,  21, 
31,  37,  38].  Such  models  are  based  on  the  ideas  of  representing  an  image  as  a  collage 
of  different  objects.  When  a  texture  can  be  represented  as  an  occlusion  of  more  basic 
components,  we  expect  it  to  interact  well  with  local  histograms,  that  is,  we  expect 
the  local  histograms  of  such  textures  to  be  a  combination  of  the  local  histograms  of 
the  more  basic  components.  The  background  literature  of  several  occlusion  models 
and  their  statistical  properties  are  discussed  below. 

Section  2.3  provides  a  review  on  the  literature  of  per-pixel  texture  classification, 
specifically  classification  of  histology  images  like  the  one  shown  in  Figure  1(a).  We 
consider  the  appropriateness  of  applying  local  histogram  features  to  the  classification 
problem.  Details  of  classification  systems  for  histology  images  are  given  in  [1-3,  5]; 
more  properties  and  details  of  the  mathematical  framework  of  local  histograms  and 
occlusion  models  are  given  in  [26-28]. 
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2.1  Local  Histograms 


Local  histograms  are  a  well-studied  signal  processing  tool.  They  have  applications 
in  image  classification,  image  registration,  computer  graphics,  computer  vision,  etc. 
They  are  used  to  compute  the  frequency  of  occurrence  of  a  given  feature  within  a 
neighborhood  of  a  given  pixel.  Though  they  are  often  used  to  compute  the  distribution 
of  color  within  the  neighborhood  of  a  given  location,  they  can  also  be  used  to  compute 
local  distributions  of  edge  orientations  and  spatial  frequencies.  Before  we  look  at 
these  local  histograms  and  their  applications,  we  first  look  at  some  methods  used  to 
compute  them. 

In  [17,  34],  a  method  for  computing  the  local  histograms  of  a  grayscale  image  is 
presented.  The  computation  involves  convolving  level  sets  of  the  input  image  with 
a  binary  kernel.  From  this,  it  can  be  shown  that  local  histograms  can  be  com¬ 
puted  by  convolving  each  level  curve  of  a  grayscale  image  with  a  binary  kernel;  we 
generalize  this  result  in  Theorem  1(a).  This  method  of  computing  smoothed,  locally- 
weighted  histograms  as  two-dimensional  spatial  convolutions  is  also  used  in  [16],  where 
a  means  for  computing  derivatives  and  integrals  of  locally-weighted  histograms  over 
large  neighborhoods  is  presented. 

In  [18,  19],  histograms  of  continuous  images  are  studied.  Though  this  disserta¬ 
tion  focuses  on  histograms  of  discrete  images,  such  continuous  theory  can  help  one 
understand  the  discrete  theory  at  a  deeper  level.  Histograms  are  usually  computed 
over  discrete  images  as  the  number  of  pixels  in  a  particular  bin.  In  order  to  obtain 
a  “density”  independent  of  the  bin-width,  one  can  divide  the  histogram  by  the  bin- 
width  [18].  In  continuous  images,  it  is  impossible  to  simply  count  the  number  of 
instances  a  given  image  obtains  a  given  pixel  value,  as  this  will  occur  an  infinite  num¬ 
ber  of  times,  in  general.  Rather,  in  continuous  images,  one  instead  computes  areas 
of  image  regions.  The  result  is  normalized  by  dividing  by  the  total  available  area. 
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In  [18],  these  facts  are  illustrated  by  a  few  one-  and  two-dimensional  examples.  This 
idea  is  also  applied  to  local  histograms  of  continuous  images.  In  other  words,  rather 
than  looking  at  the  histogram  of  the  whole  image,  regions  of  interest  are  considered 
by  computing  histograms  over  a  windowed  portion  of  an  image. 

Because  the  set  of  points  where  a  continuous  image  obtains  a  particular  pixel  value 
will  often  have  measure  zero,  the  integral  of  this  particular  indicator  function  is  zero. 
In  [18,  19],  rather  than  integrating  over  this  indicator  function,  they  first  convolve 
the  image  with  a  Gaussian  that  has  a  small  standard  deviation  and  then  integrate 
the  image  that  has  value  one  at  a  given  intensity,  but  slowly  decays  to  zero  for  values 
close  to  the  given  intensity.  That  is,  they  replace  a  level  curve,  which  has  measure 
zero,  with  a  diffuse  ribbon  which  has  a  nontrivial  integral. 

2.1.1  Local  Color  Histograms. 

Local  color  histograms  compute  the  distribution  of  color  within  the  neighborhood 
of  a  given  location.  Such  local  histograms  have  been  used  in  [26-28,  32]  for  seg¬ 
mentation  and  classification.  In  [32],  the  Wasserstein  distance  is  used  to  compare 
the  similarity  of  normalized  histograms.  This  comparison  is  used  in  a  nonparametric 
region-based  active  contour  model  for  segmenting  cluttered  scenes.  Mathematical 
properties  of  this  proposed  model,  along  with  their  proofs  and  experimental  verifica¬ 
tions,  are  also  presented. 

In  [18],  the  idea  of  locally  orderless  images  is  presented.  These  images  discard 
order  at  the  fine  level  and  replace  it  with  order  at  the  coarse  level.  In  other  words, 
the  local  image  structure  is  replaced  with  local  histograms.  They  describe  how  to 
construct  such  images,  render  them,  and  use  them  in  several  local  and  global  image 
processing  operations.  Meanwhile,  in  [12],  the  versatility  of  locally  orderlcss  images 
is  demonstrated  by  considering  a  variety  of  image  processing  tasks,  such  as  variations 
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of  adaptive  histogram  equalization,  methods  for  noise  and  scratch  removal,  texture 
rendering,  classification,  and  segmentation.  In  [19],  a  formal  framework  for  locally 
orderless  images  is  discussed.  It  is  shown  that  in  image  segmentation,  locally  orderless 
images  have  distinct  advantages  over  blurred  images.  In  particular,  when  an  image 
is  blurred,  the  pixels  lose  their  origin  and  the  boundary  is  not  clear.  For  example,  if 
we  blurred  an  image  that  had  a  uniform  green  region  and  a  uniform  red  region,  the 
boundary  between  the  two  regions  would  be  yellow  and  not  belong  to  the  green  or 
red  regions.  However,  in  locally  orderless  images,  the  pixels  do  not  change  their  hue 
and  a  boundary  area  can  be  determined  by  the  overlap  in  regions.  In  addition  to  the 
applications  above,  since  locally  orderless  images  can  be  easily  rendered,  they  may 
also  be  used  in  image  compression. 

A  global  mode  is  the  maximizer  of  a  histogram,  that  is,  the  value  that  occurs  most 
frequently.  Similarly,  a  local  mode  is  the  maximizer  of  a  local  histogram.  In  [36] ,  local 
mode  filtering  is  used  for  noise  reduction.  This  method  of  using  the  local  mode  is 
preferred  over  linear  filtering  and  global  modes  as  it  preserves  edges  and  details  in 
the  image.  Another  method  that  preserves  edges  is  bilateral  filtering  [35] .  Images  are 
smoothed  by  means  of  a  nonlinear  combination  of  nearby  image  values.  In  particular, 
it  replaces  a  pixel  value  with  an  average  of  similar  and  nearby  pixel  values.  In  [13], 
filters  satisfying  certain  image  measurement  requirements  of  scale  and  imprecision  are 
discussed.  Among  these  filters  are  the  mean  filter,  the  median  filter,  and  the  mode 
filter.  In  particular,  a  system  based  on  mode  filtering  and  its  behaviour  over  scale 
and  imprecision  is  presented. 

2.1.2  Local  Orientation  Histograms. 

Histograms  of  oriented  gradients  have  been  used  for  object  detection,  matching 
image  features,  gesture  recognition,  and  texture  classification  [6,  7,  9-11,  15].  The 
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local  orientations  in  the  image  are  obtained  by  calculating  the  first  derivatives  in  two 
orthogonal  directions.  Because  the  Gaussian  has  a  smoothing  effect  and  the  amount 
of  noise  reduction  can  be  controlled  by  the  scale  parameter,  the  partial  derivatives  in 
the  image  are  often  calculated  by  filtering  the  image  in  the  x  and  y  directions  with 
the  filters  that  implement  the  derivatives  of  the  Gaussian  function.  Edges  and  corners 
can  also  be  found  using  image  moment  transforms;  properties  of  such  transforms  for 
both  continuous  and  digital  images  are  discussed  in  [8]. 

Local  object  appearance  and  shape  can  often  be  characterized  by  the  local  his¬ 
tograms  of  gradient  directions  or  edge  orientations.  In  particular,  for  better  invariance 
to  illumination  and  shadow,  the  local  histograms  can  be  contrast-normalized  giving  us 
histograms  of  oriented  gradient  (HOG)  descriptors.  In  [7],  HOG  descriptors  are  used 
as  feature  sets  in  a  linear  support  vector  machine  (SVM)  and  outperform  the  existing 
feature  sets  used  for  object  detection.  More  details  on  the  best  spatial  sampling, 
orientation  sampling,  and  contrast-normalization  method  are  also  discussed.  In  [6], 
a  method  for  matching  image  features  is  proposed.  Rather  than  using  the  common 
method  of  scale-invariant- feature-transform-descriptors  (SIFT),  they  propose  to  use 
a  new  method  that  will  reduce  the  negative  effect  of  scale  error.  The  method  involves 
modifying  the  HOG  descriptor’s  regular  grid  to  an  irregular  grid.  In  [10],  a  method 
using  orientation  histograms  to  recognize  hand  gestures  is  presented. 

Local  orientation  histograms  have  also  been  used  in  texture  classification  [9,  11, 
15].  In  [14],  histograms  of  multiple  resolutions  of  an  image  are  computed  to  form 
a  multiresolution  histogram.  Such  histograms  directly  encode  spatial  information 
while  still  preserving  several  desirable  properties  of  the  histogram,  such  as  being  fast 
to  compute,  achieving  significant  data  reduction,  being  invariant  to  rigid  motions,  and 
being  robust  to  noise.  In  [9,  11,  15],  this  idea  of  computing  multi-resolution  histograms 
is  applied  to  local  image  orientation.  Because  the  dominant  orientation  depends 
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strongly  on  the  observation  scale,  the  macro-texture  is  evaluated  by  calculating  the 
distribution  of  the  dominant  orientations  for  all  pixels  in  the  image  that  sample  the 
texture  at  a  micro-level. 

2.1.3  Local  Spectral  Histograms. 

Local  spectral  histograms  are  used  for  image  and  texture  segmentation  and  tex¬ 
ture  classification  [22-24],  The  method  for  computing  local  spectral  histograms  is 
discussed  in  [22,  23].  The  spectral  histogram  is  defined  with  respect  to  a  bank  of 
Liters.  A  filter  selection  algorithm  that  maximizes  classification  performance  is  pro¬ 
posed  in  [22],  Filters  that  have  shown  to  be  effective  for  different  kinds  of  textures 
include  the  intensity  filter,  difference  or  gradient  Liters,  Laplacian  of  Gaussian  Liters, 
and  Gabor  Liters  with  both  sine  and  cosine  components  [22], 

To  compute  the  local  spectral  histogram,  a  sub-band  image  is  computed  for  each 
Liter  by  taking  a  window  in  an  input  image  and  linearly  convolving  it  with  each  Liter. 
Then  the  histogram  of  each  of  the  sub-band  images  is  computed.  The  collection  of 
these  histograms,  appropriately  scaled,  is  called  the  spectral  histogram  of  an  image 
window.  In  other  words,  the  spectral  histogram  of  an  image  window  is  a  vector 
consisting  of  marginal  distributions  of  Liter  responses. 

The  distance  between  two  spectral  histograms  is  computed  using  a  y2-statistic. 
Applying  this  distance  measure  to  spectral  histograms  provides  a  means  for  texture 
classiLcation.  Several  properties  of  spectral  histograms  are  proved  in  [22]:  it  is  shown 
that  the  spectral  histogram  is  translation-invariant  and  a  nonlinear  operator.  Addi¬ 
tionally,  with  sufficient  Liters,  a  spectral  histogram  can  uniquely  represent  any  image 
up  to  a  translation.  Finally,  all  of  the  images  sharing  a  spectral  histogram  deLne  an 
equivalence  class. 

The  process  of  applying  local  spectral  histograms  for  image  and  texture  segmen- 
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tation  in  [23,  24]  includes  three  major  steps.  First,  an  initial  segmentation  result  is 
obtained  by  classifying  local  windows.  This  is  done  by  computing  the  local  spectral 
histograms  of  a  subsample  of  pixels  and  then  using  the  distance  measure  to  determine 
the  classification  of  the  pixels.  After  this  first  step,  an  algorithm  is  used  to  iteratively 
update  the  segmentation  using  the  derived  probability  models.  The  final  step  involves 
reducing  the  boundary  uncertainty.  In  particular,  the  large  spatial  window  used  for 
spectral  histograms  can  cause  a  large  segmentation  error  to  occur  on  the  boundary 
of  the  textures.  The  region  boundaries  are  localized  by  building  refined  probability 
models  that  are  sensitive  to  spatial  patterns  in  segmented  regions. 

2.2  Occlusions 

In  this  dissertation,  we  formally  study  how  local  histograms  interact  with  tex¬ 
tures  that  are  formed  as  the  occlusion  of  simpler  components.  Intuitively,  the  local 
histograms  of  the  texture  are  related  to  the  local  histograms  of  the  components  that 
comprise  the  texture.  Indeed,  if  we  could  break  apart  the  histogram  of  the  texture, 
we  could  gain  information  about  what  components  are  present  and  to  what  degree. 
We  now  give  an  overview  of  the  literature  on  different  occlusion  image  models  and 
their  applications. 

Certain  image  textures  can  be  thought  of  as  occlusions  of  more  basic  components, 
that  is,  images  are  collages  of  statistically  independent  objects  [20].  Several  statistical 
models  for  occlusions  have  been  presented  [4,  20,  21,  31,  37,  38].  The  goal  of  the 
models  is  to  simulate  images  that  well-approximate  the  statistics  of  natural  images. 

In  [20],  a  stochastic  model  that  takes  occlusions  into  account  and  is  both  translation- 
invariant  and  fully  scale-invariant  is  presented.  Comparing  the  statistics  of  the  simu¬ 
lated  images  of  the  occlusion  model  with  statistics  of  natural  images  shows  that  the 
single  pixel  statistics  and  the  derivative  statistics  agree  well  if  the  simulated  images 
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are  block  averaged  a  few  times  for  suitable  subpixel  resolution.  Models  of  images  that 
capture  the  invariance  properties  of  real  scenes  could  be  used  in  image  compression 
and  in  the  study  of  sensory  processing  in  biology. 

In  [21],  an  occlusion  model  based  on  the  notion  that  the  three-dimensional  world 
creates  a  collage  of  occluded  objects  is  presented.  This  model  is  called  the  dead 
leaves  model  and  resembles  images  that  were  formed  by  randomly  adding  independent 
elementary  shapes  in  layers.  A  comparison  of  the  statistics  of  the  occlusion  model 
with  the  empirical  statistics  of  two  large  databases  of  natural  images  gives  excellent 
qualitative  and  good  quantitative  agreement.  Such  statistics  have  applications  in 
computational  and  biological  vision.  The  image  model  comes  close  to  duplicating 
several  elementary  statistics  of  natural  images  and,  at  that  time,  was  the  only  image 
model  to  do  so.  A  result  characterizing  the  distribution  of  the  boundary  between  the 
shapes  generated  by  the  dead  leaves  model  is  given  in  [4], 

The  apparent  scale-invariance  exhibited  by  the  statistics  of  natural  images  is  dis¬ 
cussed  in  [31].  In  particular,  there  do  not  exist  any  scale-invariant  probability  mea¬ 
sures  supported  on  image  functions.  In  order  to  construct  such  probability  measures, 
their  samples  must  be  generalized  functions.  The  basic  idea  of  the  paper  is  to  assume 
that  an  image  having  clutter  ci  +  C2  can  be  constructed  by  adding  independent  im¬ 
ages  with  clutter  c i  and  C2.  Image  models  having  this  property  are  studied  and  a  few 
axioms  for  such  models  are  presented.  These  axioms  are  shown  to  be  satisfied  using 
the  convergence  of  random  wavelet  expansions.  The  authors  of  [31]  believe  that  the 
axioms  present  a  natural  model  for  images  that  is  closer  to  the  truth  than  Gaussian 
models,  but  nevertheless  do  not  capture  all  the  basic  qualitative  properties  of  such 
images. 

In  [37,  38],  a  statistical  approach  for  partially  occluded  object  recognition  is  pre¬ 
sented.  This  approach  is  based  on  matching  extracted  local  features  to  template 
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features.  Two  different  statistical  occlusion  models  are  introduced  and  experiments 
illustrating  the  differences  between  the  two  occlusion  models  are  presented.  In  [38], 
model-based  statistical  algorithms  for  recognition  are  used  on  examples  from  syn¬ 
thetic  aperture  radar  imagery  to  evaluate  the  recognition  performance  and  illustrate 
their  advantages  over  other  algorithms. 

2.3  Per-Pixel  Texture  Classification 

Per-pixcl  classification  is  important  in  many  applications.  Classification  of  aerial 
images  has  applications  in  automatic  target  recognition  (ATR)  and  image  registration. 
In  particular,  one  of  the  long-term  goals  of  this  research  is  to  apply  local  histogram 
features  to  a  problem  of  passive  navigation  and  surveillance,  namely  the  development 
of  an  image-based  passive  navigation  system  which  performs  a  real-time,  per-pixcl 
classification  of  the  surrounding  terrain,  and  registers  the  result  to  an  internally  stored 
map.  Up  to  this  point,  our  work  in  this  area  has  focused  on  a  similar  biomedical  image 
processing  problem  [1-3,  5],  and  has  been  performed  in  conjunction  with  Professor 
Jelcna  Kovacevic’s  group  at  Carnegie  Mellon  University.  The  focus  of  [26-28]  is 
to  present  a  rigorous  mathematical  theory  for  the  analysis  of  local  histograms  and 
consider  the  appropriateness  of  their  use  in  the  classification  of  histology  images, 
whereas  [1-3,  5]  focus  more  on  the  classification  system  and  the  method  used  to 
automatically  identify  and  delineate  histology  images. 

In  [26-28] ,  we  discuss  some  of  the  many  image  features  that  pathologists  indicate 
they  use  when  classifying  tissues  by  hand.  We  present  a  probabilistic,  occlusion-based 
model  for  textures  which  shows  how  tissue-similar  textures  can  be  built  from  simpler 
ones;  details  of  this  work  are  given  in  Chapter  III.  After  proving  some  properties  of 
local  histograms,  we  discuss  how  they  relate  to  our  model.  We  also  discuss  how  local 
histogram  transforms  can  be  used  as  numerical  features  in  the  classification  system 
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to  mimic  the  pathologist’s  thought  process. 

In  [1-3,  5] ,  a  framework  and  methodology  for  the  automatic  identification  and  de¬ 
lineation  of  histology  images  is  presented.  Although  pathologists  can  accurately  and 
consistently  identify  and  delineate  tissues  and  their  pathologies,  it  is  an  expensive 
and  time-consuming  task,  therefore  presenting  the  need  for  an  automated  algorithm. 
In  [3],  an  example  of  such  an  algorithm  is  validated  on  a  clinical  and  research  applica¬ 
tion  and  experimental  results  show  great  promise  towards  developing  an  automated 
tool  for  digital  histopathology.  We  now  present  our  work  on  the  analysis  of  local 
histograms. 
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III.  Local  Histograms  and  Image  Occlusion  Models 


In  this  chapter,  we  discuss  some  of  the  basic  properties  of  local  histograms.  In 
particular,  we  show  that  local  histograms  can  be  computed  using  convolutions.  We 
also  introduce  a  model  for  textures  that  relies  on  the  concept  of  image  occlusions, 
and  present  one  of  our  main  results,  that  is,  on  average,  the  local  histograms  of  the 
composite  of  a  sequence  of  images  can  be  approximated  by  a  convex  combination — a 
linear  combination  with  nonnegative  coefficients  that  sum  to  one — of  the  local  his¬ 
tograms  of  each  image.  We  discuss  the  conditions  under  which  the  approximation 
becomes  perfect,  namely,  when  the  expected  value  of  the  graph’s  characteristic  func¬ 
tion  is  independent  of  pixel  location,  a  property  we  call  flatness.  We  also  consider 
what  properties  of  occlusion  models  give  us  flatness  and  what  operations  can  be  used 
to  build  intricate  flat  occlusion  models  from  more  simple  examples. 

3.1  Local  Histograms 

In  this  section,  we  formally  introduce  local  histograms,  discuss  an  efficient  means 
of  computing  them,  and  discuss  several  of  their  basic  properties.  We  regard  our 
images  as  functions  from  a  finite  abelian  group  X  of  pixel  locations  into  a  second 
finite  abelian  group  y  of  pixel  values.  That  is,  an  image  /  is  a  member  of  the  set 
yX  {/  :  X  y  3d-  For  example,  the  1200  x  1200,  8-bit  red-green-blue  (RGB)  image 
given  in  Figure  1(a)  has  X  =  Zf200  =  Z1200  ©  ^1200  and  y  =  Z256,  where  Z tv  denotes 
the  cyclic  group  of  integers  modulo  N.  Here,  we  note  that  ©  denotes  the  direct  sum  of 
abelian  groups.  That  is,  it  is  the  Cartesian  product  equipped  with  pointwise  addition; 
formally,  X  ©  y  :=  {(x,y)  :  x  G  X,y  e  (T},  where  (x,y)  +  (. x',y ')  :=  (x  +  x',y  +  y'). 
For  purple-pink  H&E-stained  images,  we  often  omit  the  green  channel  for  the  sake 
of  computational  efficiency,  at  which  point  y  becomes  Z256.  The  local  histograms  of 
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an  image  /  are  defined  in  terms  of  a  weighting  function  w  G  M*.  In  practice,  this 
function  is  usually  chosen  to  be  a  discretized  Gaussian  or  some  nonnegative-valued 
function  whose  values  sum  to  one.  However,  for  the  purpose  of  generality,  we  assume 
w  to  be  arbitrary  unless  stated  otherwise.  Specifically,  the  local  histogram  transform 
of  /  with  respect  to  w  is  the  function  LH wf  :  X  ©  y  — *  M, 

(LH wf)(x,y)  :=  w(x')5y(f(x  +  x')),  (1) 

x'ex 

where  Sy(y')  1  if  y'  —  y  and  is  otherwise  zero.  For  any  fixed  x  G  X,  the  cor¬ 
responding  cross-section  of  this  function,  namely  (LH^/)(x,  •)  :  y  — »  M,  counts  the 
number  of  instances  at  which  /  obtains  a  given  value  y  in  a  re-neighborhood  of  x. 

Computing  local  histograms  can  be  time  consuming,  especially  as  X  and  y  become 
large.  Noting  that  \X\  denotes  the  cardinality  of  X,  we  have  that  for  a  general 
weighting  function  w,  a  direct  computation  of  (1)  requires  C>(|A’|2| ^|)  operations: 
summing  the  real  values  w{x')  for  all  x'  G  X  such  that  f(x  +  x')  =  y.  A  more 
efficient  method  is  given  in  Theorem  1  below:  (1)  can  be  computed  as  a  system  of 
\y\  convolutions  over  X,  which  only  requires  (9(|T||3^|  log  \X\)  operations  if  discrete 
Fourier  transforms  are  used.  In  particular,  we  filter  the  characteristic  function  of  the 
graph  of  /,  namely  If  :  X  ©  y  — »  M, 

1 f(x,y)  ■=  lf-i{v}(x)  =  5y(f(x))  =  <  ^  ^  Jl  (2) 

[  o,  fix)  y  y, 

with  the  reversal  of  w  G  namely  w(x)  :=  w(—x).  Here,  we  let  f~l{y}  denote  the 
preimage  of  { y }  under  /,  that  is,  the  set  of  all  x  G  X  such  that  f(x)  =  y.  We  note 
that  the  characteristic  function  does  not  distribute  over  addition  and  is  thus  nonlinear. 
Therefore,  we  have  that  local  histograms  are  also  nonlinear;  a  fact  which  can  easily  be 
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verified.  This  method  for  computing  local  histograms  using  convolutions  is  illustrated 
in  Figure  4.  Alternatively,  (1)  can  be  computed  as  a  single  convolution  over  X  ©  y-, 
here,  the  tensor  product  of  w  G  M*  with  to  G  is  defined  as  w  <E>  to  G 
(w  Cg  to)(x,  y )  :=  w(x)uo{y). 

Theorem  1.  For  any  w  G  Rx ,  to  G  Ry ,  f  G  yx ,  x  G  X,  and  y  G  T-' 

(a)  Local  histograms  (1)  can  be  evaluated  as  a  system  of  \y\  convolutions  over  X : 

(LH  wf)(x,y)  =  (lf-i{y}*w)(x). 


(b)  Alternatively,  (1)  may  be  computed  as  a  single  convolution  over  X  ©  Jb 

(<5o  <£><*;)  *  LH wf  —  If  *  (w  Cg  to). 

In  particular,  taking  to  =  50  gives  LH wf  —  If  *  (w  0  (50). 

Proof.  For  (a),  replacing  x'  with  —x',  and  substituting  5y(f(x  —  x'))  =  lf-i{y}(x  —  x'), 
a  relation  which  follows  from  (2),  into  (1)  yields: 

(LH wf)(x,y)  =  ^  w(x')5y(f(x  +  x')) 

x’ex 

=  J2w(-x')lf-l{y}(x^x') 
x’ex 

=  ^2w(x')lf-i{y}(x-x') 
x’ex 

=  (w*lf-i{y})(x).  (3) 
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Figure  4.  An  example  of  how  to  compute  local  histograms  using  Theorem  1(a).  For  the  sake 
of  readability,  larger  numerical  values  are  represented  by  darker  shades  throughout.  The  source 
image  (far  left)  /  is  6  x  8  and  has  grayscale  values  ranging  from  0  to  4.  That,  is  f  £  yx  where 
X  =  Z6  ®  Z8  and  y  =  Z5.  Its  characteristic  function  (2)  is  a  {0,  l}-valued  6x8x5  data  cube 
whose  cross-sections  (left  column)  indicate  those  locations  at  which  /  attains  any  given  value.  By 
Theorem  1(a),  the  6x8x5  data  cube  that  contains  the  local  histograms  of  /  (far  right)  may  be 
computed  one  level  at  a  time  (right  column)  by  filtering  these  binary-valued  cross-sections  with 
a  real-scalar- valued  weighting  function  (middle  column).  In  this  simple  example,  the  weighting 
function  is  w  =  ^<5o,o  +  |(<5-i,o  +  <$i,o  +  <5o,-i  +  <5o,i ) ,  where  the  origin  lies  in  the  upper  left-hand 
corner  of  the  grid. 
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For  (b),  the  definition  of  <50  gives: 


[(50  ®uj)  *  LRwf](x,  y)  —  ^  (So  ®u){x\y')(Uiwf)(x-x',y-y') 

(x',y')ex®y 

=  ^2UJ(y')(LE.wf)(x,y-y').  (4) 

y'ey 

Substituting  (3)  into  (4)  and  using  (2),  gives  our  result: 

[(50  ®u;)  *  LH wf](x,y)  =  ^  *  1f~1{y-y'})(x) 

y’&y 

=  ™w)if-Hv-v,}(x  - x> ) 

y'ey  X'ex 

=  (' w®u)(x\y')lf(x-x\y-y ') 

{x',y')&X®y 

=  [(w®uj)  *  1  f\(x,y). 

For  the  final  conclusion,  we  note  that  when  u  —  Sq.  it  follows  from  this  result  that 
LH wf  =  (w  <g>  S0)  *lf-  In  other  words,  the  local  histogram  can  be  computed  by  con¬ 
volving  the  graph’s  characteristic  function  (2)  with  a  filter  w®S o  which  is  completely 
supported  on  one  cross-section  of  X  ©  y  where  y  =  0.  □ 

In  our  next  result,  we  make  use  of  the  translation  operator  Tx  :  Z ^  — >  Z^, 
T X(p(x')  :=  tp(x/  —  x ).  Here,  we  summarize  several  other  basic  properties  of  local 
histograms: 

Proposition  2.  For  any  w  €  M*  and  f  e  yx  : 

(a)  If  the  values  of  w  sum  to  1,  then  the  levels  of  a  local  histogram  transform  also 
sum  to  1: 


^(LHu;/)(a;,S/)  =  l,  Vz  e  X. 
y&y 
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(b)  Local  histograms  commute  with  spatial  translation  T,T: 


LHWT*  =  T^LH*,,  Vx  G  X. 

(c)  Adding  constants  to  images  shifts  their  local  histograms  along  y-. 

LHW(/  +  y)  =  T^LRwf,  Vy  G  y. 

(d)  Quantizing  an  image  will  bin  its  local  histograms: 

[LRw(qo  f)](x,y')  =  ^  (LH wf)(x,y),  \/q  :  y  y' . 

yty 

q(y)=y' 

Proof.  For  any  w  G  M*  and  /  G  yx\ 

(a)  Let  x  G  X  and  assume  that  'f2xiexw(x')  =  1-  By  interchanging  sums  and 
noting  that  for  some  fixed  x'  G  X,  f(x  +  x')  is  equal  to  one  and  only  one  y  G  y, 
it  follows  that: 

5^(lh wf)(x, y)  =  ^2Yl  w(x')W(x  +  x'))  =  w (x')  =  L 

yey  y&y  x'eX  x'eX 

(b)  Let  (x',y)  G  X  ©  y.  The  result  follows  directly  from  the  definitions  of  local 
histogram  and  translation: 

(LH.T7) (*',</)=  T  w(x")S,(T’f(x'  +  x")) 

x"ex 

=  (LH wf)(x'  -  x,y) 

=  (T(a’0)LH  wf){x',y). 
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(c)  Let  (x,y')  G  X  ©  y.  First  note  that 


Sy'((f  +  y)(x  +  x')) 


1,  if  +  y)(x  +  x')  =  y', 
0,  (f  +  y)(x  +  x')  ^y', 

1,  f(x  +  x')  =  y'  -y, 

0,  f(x  +  x')  ±  y'  -y, 


=  $y'-y(f(x  +  x')). 


(5) 


Combining  the  definition  of  the  local  histogram  with  (5)  gives  our  result: 
[LH w(f  +  y)\(x,y')  =  Y  w(x')Sy'-y(f(x  +  x')) 

x'ex 

=  (LH wf)(x,y'  -y) 

=  (T^LH  wf)(x,y'). 


(d)  We  claim  that: 


f)(x  +  x'))  =  Y  W(x  +  x')).  (6) 

y&y 

q(y)=y' 

For  fixed  x,x',y',f,  and  q,  the  left-hand-side  of  (6)  equals  one  if  and  only  if 
( q  o  f)(x  +  x')  =  y' ,  that  is,  if  and  only  if  there  exists  some  y  G  y  such  that 
f(x  +  x')  —  y  and  q(y)  =  y',  implying  the  right-hand-side  of  (6)  is  also  one. 
Alternatively,  if  the  left-hand-side  of  (6)  is  zero,  then  f(x  +  x')  does  not  equal 
any  y  such  that  q(y)  =  y'  and,  therefore,  the  right-hand-side  of  (6)  is  also  zero. 
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This  claim,  along  with  (1),  gives  our  result: 


[LRw(qo  f)\(x,y')  =  ^  ^  w(x')5y(f(x  +  x1))  =  J^(LH wf)(x,y).  □ 

y&y  x'£X  y£y 

i(y)=y'  <j{y)=y' 

In  the  following  section,  we  define  a  model  for  the  occlusion  of  a  sequence  of  images 
and  present  one  of  our  main  results:  on  average,  the  local  histogram  of  the  occlusion 
of  a  sequence  of  images  can  be  approximated  as  a  convex  combination  of  the  local 
histograms  of  each  image. 

3.2  A  Probabilistic  Image  Occlusion  Model 

In  this  section,  we  rigorously  confirm  our  intuition  regarding  local  histograms  of 
textures  generated  via  random  occlusions:  if  a  texture,  such  as  that  found  in  the  pseu- 
dovascular  tissue  of  Figure  3(d),  is  some  sufhciently-spatially-random  combination  of 
50%  pink  pixels,  25%  purple  pixels  and  25%  red  pixels,  then  its  local  histograms 
should,  on  average,  be  a  mixture  of  three  simpler  distributions,  namely  a  convex 
combination  of  0.5  of  a  purely  pink  distribution  with  0.25  purely  purple  and  red  ones. 
We  rigorously  show  that  such  decompositions  of  local  histograms  indeed  exist  for 
textures  arising  from  a  certain  class  of  probabilistic  image  models. 

To  see  how  to  formalize  these  ideas,  we  consider  the  example  given  in  the  intro¬ 
duction  (Figure  2)  where  at  each  pixel  location,  a  coin  is  flipped,  resulting  in  either  a 
white  pixel  value  (heads)  or  a  black  pixel  value  (tails).  One  expects  that,  on  average, 
the  local  histogram  at  any  point  will  consist  of  two  peaks:  one  in  the  white  portion 
of  y,  and  one  in  the  black.  Such  an  image  can  be  regarded  as  the  result  of  occluding 
a  solid  black  image  /o  with  a  solid  white  one  /p  at  each  pixel,  the  flip  of  a  coin  de¬ 
termines  whether  f\  lies  on  top  of  /o  at  that  point,  or  vice  versa.  More  generally,  the 
occlusion  of  a  set  of  N  images  {fn}n=o  in  yX  with  respect  to  a  given  label  function 
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( p  G  is: 


(7) 


(occ^j  fn}n=o)(x)  1=  f<p{x)(x). 

That  is,  at  any  pixel  location  x,  the  label  yp(x)  determines  which  of  the  potential 
pixel  values  {fn(x)}n=o  actually  appears  in  the  composite  image  occv{fn}^=Q  at 
that  point. 

The  main  results  of  this  dissertation  are  concerned  with  when  the  local  his¬ 
tograms  (1)  of  a  composite  image  (7)  are  related  to  the  local  histograms  of  the 
individual  images  fn.  Though  it  is  unrealistic  to  expect  a  clean  relation  for  any 
fixed  <p,  we  can  show  that  these  quantities  are  indeed  closely  related,  provided  one 
averages  over  all  possible  label  functions  <p.  Indeed,  denoting  the  probability  of  get¬ 
ting  “heads”  in  the  above  toy  example  as  p  e  [0,1],  we  would  expect  the  volumes 
of  the  white  and  black  peaks  of  the  composite  image’s  local  histograms  to  be  p  and 
1  —  p,  respectively.  That  is,  LH^occ^j/o,  /1}  should  be  (1  —  p)LH„,/0  +  pLHwf\ ,  on 
average.  We  generalize  this  idea  so  as  to  permit  more  realistic  textures  with  more 
colors  and  with  spatially-correlated  pixels. 

To  be  precise,  fix  a  set  of  source  images  {fn}n=o  an(l  consider  the  set  of  all  possible 
composite  images  {occ^l/nj^To1}^^  as  defined  in  (7)  and  obtained  by  letting  be 
any  one  of  the  elements  of  Z^.  We  refer  to  a  random  method  for  choosing  one  of 
these  composites  as  an  occlusion  model  <f>.  Formally  speaking,  <f>  is  a  random  variable 
version  of  the  label  function  <p.  That  is,  we  work  in  the  probability  space  (£7,  T,  V), 
where  the  sample  space  £7  is  the  set  Z^  of  all  functions  from  X  to  Z^r,  the  o- algebra 
T  is  the  power  set  of  Z^,  and  the  probability  measure  V  is  defined  as: 


iex 


where  P$  is  the  corresponding  probability  density  function  on  Z^,  that  is,  it  is  a 


nonnegatively- valued  function  such  that  P^V7)  =  1-  Going  back  to  our  coin¬ 

flipping  example,  the  role  of  the  occlusion  model  is  to  determine  the  likelihood  of 
choosing  a  particular  p  :  X  — >  Z2.  In  particular,  in  a  Bernoulli  model ,  we  fix  pi  e 
[0, 1],  let  p0  :=  (1  —  Pi)  and  define  the  probability  of  picking  any  particular  p  e  Z * 
as  P$(<p)  =  El xex  P<p{x)-  We  emphasize  here  that  though  this  method  of  producing 
images  is  random,  it  is  by  no  means  uniform:  the  role  of  the  model  is  to  emphasize 
certain  images  over  others.  In  particular,  when  pi  —  \  the  Bernoulli  model  will  often 
produce  images  similar  to  Figure  2(a),  but  will  almost  never  produce  images  like 
Figure  2(b)  and  (c).  Indeed,  Figure  2(b)  and  (c)  arise  from  distinct  Bernoulli  models 
in  which  p\  =  \  and  p\  —  |,  respectively.  Images  such  as  Figure  2(e)  are  therefore 
regarded  as  composites  of  images  arising  from  several  distinct  models.  From  this 
perspective,  our  segmentation  and  classification  algorithm  boils  down  to  using  local 
histograms  to  estimate  which  of  these  three  models  is  being  exhibited  at  any  pixel. 

For  a  more  realistic  example,  imagine  three  128  x  128  images  /0,  f\  and  /2  which 
exhibit  nearly  constant  shades  of  pink,  purple  and  red,  respectively.  Given  any  label 
function  p  :  Z228  —> y  Z3  we  can  produce  a  corresponding  128  x  128  composite  image 
occ^l/o,  /i,  /2}  whose  pixels  are  some  mixture  of  pink,  purple  and  red.  For  some 
choices  of  p  the  resulting  composites  will  look  like  the  pseudovascular  tissue  texture 
given  in  Figure  3(d).  However,  even  in  this  small  example,  there  are  an  enormous 
number  of  such  possible  composites — one  for  each  of  the  312§2  possibilities  for  p — and 
only  a  few  of  these  will  look  like  pseudovascular  tissue;  most  will  appear  as  pink- 
purple-red  static.  The  role  of  the  occlusion  model  $  is  to  assign  a  probability  to 
each  of  these  possible  p's  in  a  manner  that  emphasizes  those  textures  one  expects  to 
appear  in  a  given  tissue  while  de-emphasizing  the  rest. 

To  formally  confirm  our  intuition  that  local  histograms,  on  average,  should  dis¬ 
tribute  over  random  occlusions,  fix  any  set  of  N  source  images  {fn}n=o  and  let  ® 
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be  any  occlusion  model  as  defined  in  (7).  That  is,  let  $  be  a  random  variable  ver¬ 
sion  of  a  label  function  p  :  X  — y  Zjv,  as  defined  by  a  probability  density  function 
P$  :  — »  [0, 1]  where  Yhipez*  P^V7)  =  1-  In  the  results  that  follow,  a  useful  quantity 

to  consider  is  the  expected  value — with  respect  to  P$ — of  the  characteristic  function 
lv  obtained  by  letting  /  =  p  in  (2): 

l<s>(x,n)  :=  ^  P$(v?)l !p(x,n)  =  ^  P$  (</?)■  (8) 

( p(pc)=n 

Essentially,  l$(x,  n)  is  the  probability  that  a  random  label  function  99  generated  by 
the  occlusion  model  <f>  will  assign  label  n  to  pixel  location  x.  That  is,  for  any  fixed  x 
and  n,  1  $(x,n)  is  the  expected  value  of  1  v(x,n).  More  generally,  for  any  real-valued 
function  /3  over  Z^,  the  expected  value  of  /3  is: 


E*(/3)  :=  ^  (9) 

Having  these  concepts,  we  present  one  of  our  main  results: 

Theorem  3.  For  any  sequence  of  images  {fn}n=o  yX  >  weighting  function  w  arid 
any  N -image  occlusion  model  <f>,  the  expected  value  (9)  of  the  local  histogram  (1)  at 
any  given  (. x,y )  G  X  ©  y  of  the  composite  image  (7)  with  respect  to  w  is: 

N—l 

E$(LH^occ${/„}^r01)(a;,  y)  =  ^  1$  (x,  n)  (LHw/n)  (x,  y)  +  £,  (10) 

n= 0 


where  the  error  term  e  is  bounded  by  £  < 
Moreover, 

N- 1 


N—l 

EE  |w(a;/)||l$(a:  +  x',n)  —  l$(x,  n)|. 

n= 0  x'&X 


J^U(a:,n)  =  l,  (11) 

n=0 
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and  so  (10)  states  that,  the  local  histograms  of  the  composite  image  occ^fn}^"^ 
can,  on  average,  be  approximated  by  convex  combinations  of  local  histograms  of  each 
individual  image  fn. 

Proof.  The  expected  value  of  the  local  histogram  (1)  of  a  composite  image  (7)  is: 

E$(LH„,occ${/r)}()r=r01)(a;, y)  =  E  E  w(x>)Sy((occAfn}n=o)(x  +  x'))-  (12) 

x'ex 

For  any  fixed  <p,  x,  and  x' ,  we  have  ip(x  +  x')  =  n  for  exactly  one  n.  For  any  fixed 
x,  x'  and  y,  we  can  therefore  split  a  sum  of  lv(x  +  x' ,n)Sy(fn(x  +  x'))  over  all  n  into 
one  summand  where  n  =  ip(x  +  x')  and  the  remaining  N  —  1  summands  for  which 
n  ^  ip(x  +  x'): 

N-l 

E  l^x  +  x'i  n)5y(fn(x  +  x'))  =  (l)Sy(fv{x+x')(x  +  x'))  +  E  (Q)sv(fn(x  +  x')) 

77=0  n^<p{  x-\-x') 

=  dy((oCC  Mn}n=o)(x  +  O),  (13) 

where  the  final  equality  follows  immediately  from  (7).  Substituting  (13)  into  (12)  and 
using  (8)  yields: 

N—l 

E$(LHu;OCC ®{fn}n=o)(x,y)  =  E  E  W E  l^X  +  X'in)Wn(x  +  x')) 

pez*  x'GX  n= 0 

N-l 

=  E  E  W(X')Sy(fn(x  +  x'))  E  +  x'  ^n) 

n= 0  x'£X 
N-l 

=  E  E  w(x')fiy(fn(x  +  a:'))U(a:  +  a/, n).  (14) 

n=  0  aj'eV 

iV-1 

Rewriting  (14)  in  terms  of  £  :  =  EE  w(x,)8y(fn(x  +  x/))[l$(a;  +  x',n)  —  l$(x,n)] 

n= 0  x'ex 
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gives  our  first  claim  (10): 


N-l 

E$(LHwocc${/n}()r=r01)(a;,  y)  =  EE  w(x')5y(fn(x  +  x'))l^(x,  n)  +  £ 

n= 0  x'GX 
N-l 

=  22  n)  w(x')fiy(fn(x  +  x'))  +  £ 

n= o  yea: 

JV— 1 

=  ^U(x,n)(LH  wfn){x,y)  +e. 
n— 0 


For  the  second  claim,  we  bound  e  using  the  triangle  inequality  and  the  fact  that 
\8y(fn(x  +  x'))\  <  1: 


£ 


N-l 

EE  w(x')8y(fn(x  +  x'))[l$(x  +  x',n) 

n= 0  x’&X 


N-l 

<  '22  2^  l'U7(a;/)l|l$(a;  +  xr,  n )  —  l$(x,  n)\. 

n= 0  x'£X 


!$(:£,  n) 


Finally,  to  prove  onr  third  claim  (11),  note  that  for  any  fixed  x  e  X,  (8)  gives: 


N-l 


N-l 


N-l 


22  i<&(x,n)  =  J2  22  p9((p)iv(x,n)  =  22  p*(^)5I 

n=0  n= 0  pcz*  n= 0 


1,  <p(x)  =  n, 
0,  <p(x)  ^  n. 


(15) 


Since,  as  previously  noted,  we  have  (p(x)  =  n  for  exactly  one  n,  (15)  becomes: 

N-l 

J^T$(x,n)  =  22  =  1-  □ 

"=°  V&K 

An  example  illustrating  the  direct  computation  of  the  left-hand  side  of  (10)  is 
given  in  Figure  5.  Note  that  Theorem  3  implies  that  the  error  term  e  in  (10)  will  be 
small  provided  the  probability  1  $(x,  n)  of  assigning  label  n  to  x  changes  little  as  x 
varies  over  regions  smaller  than  the  the  support  of  w.  In  the  next  section,  we  discuss 
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Figure  5.  An  example  of  how  to  compute  the  left-hand  side  of  (10)  explicitly  as  a  probability- 
weighted  sum.  For  the  sake  of  readability,  larger  numerical  values  are  represented  by  darker  shades 
throughout.  We  consider  two  2x2,  3-bit  images,  namely  {fn}n=o  in  yX  where  IV  =  2,  A  =  Z2  ©Z2 
and  y  =  Zg.  In  this  particular  example,  the  values  of  the  /„’ s  are  all  distinct,  with  f0  assuming 
values  {0,1,2,  3}  and  /i  assuming  values  {4,  5,6,  7}  (far  left).  There  are  N\x\  =  22"  =  16  distinct 
label  functions  <p  :  Z2  ©  Z2  — >  Z2  (left  column)  each  yielding  a  composite  image  occ¥,{/o,  /i}  (center 
column);  in  accordance  with  (7),  we  take  values  from  fo  in  places  where  <p  is  white  and  values  from  /i 
where  ip  is  black.  Each  of  these  composites  has  a  2x  2x  8  local  histogram  transform  (1)  (right  column) 
which  was  computed  using  the  weighting  function  w  =  ^o,o  +  j(<5i,o  +  <$o,i)-  Since  occlusion  (7) 
is  nonlinear,  there  is  no  clean  relationship  between  the  local  histograms  of  any  single  composite 
and  the  local  histograms  of  the  source  images  fo  and  f\.  Nevertheless,  under  certain  hypotheses, 
we  can  say  something  about  the  average  of  these  local  histograms  (far  right)  with  respect  to  some 
probability  density  function  P$  on  the  set  0f  all  possible  <p’s;  in  this  example,  we  let  each 

ipk  have  equal  probability,  that  is,  we  let  P$(<^fc)  :=  Pk  =  pg-  In  particular,  if  the  occlusion  model 
is  flat  (16),  meaning  in  this  case  that  the  probability- weighted-sum  of  all  <p's  is  a  constant  function 
of  x,  then  Theorem  4  states  that  this  average  is  a  convex  combination  of  the  local  histograms  of  fo 
and  fi  as  depicted  in  Figure  6. 


what  hypotheses  on  <f>  will  make  the  e  in  Theorem  3  vanish  completely. 

3.3  Constructing  Flat  Occlusion  Models 

In  this  section,  we  provide  a  sufficient  hypothesis  on  the  occlusion  model  so 
as  to  ensure  that  the  local  histograms  (1)  of  a  composite  image  (7)  can,  on  average 
with  respect  to  P$,  be  decomposed  in  terms  of  the  local  histograms  of  the  individual 
images.  In  particular,  we  focus  on  the  special  case  where  the  occlusion  model  <f>  is 
flat ,  meaning  that  on  average,  the  probability  that  <f>  chooses  label  n  at  a  given  pixel 
location  x  is  equal  to  the  probability  of  choosing  n  at  any  other  x'\  formally,  <f>  is  flat 
if  there  exists  scalars  {\n}„  flQ  such  that  for  each  n  E  Z^: 

E  P^V7)  =  ^n,  Vx  E  X.  (16) 

1 

ip(x)=n 

Equivalently,  <h  is  flat  if  l$(:r,  n) ,  dehned  in  (8),  is  constant  with  respect  to  x.  That  is, 
$  is  flat  if  the  marginal  distributions  obtained  by  fixing  any  given  x  E  X  are  identical. 
Note  that  for  any  fixed  x  E  X,  summing  (16)  over  all  n  yields  that  Y^n=  i  =  P 
Indeed,  at  any  given  pixel  location  x,  the  value  An  represents  the  probability  that 
the  random  label  function  <f>  will  have  label  n  at  that  x.  In  our  example  where  the 
values  of  p  are  determined  by  |  ^  |  spatially- independent  coin  flips,  the  probability  of 
getting  any  particular  <p  E  Z*  is  P$(<^)  =  p^_1P^(l  —  p) substituting  this 
expression  into  (16),  we  find  that  this  model  is  indeed  flat  with  A0  =  1  —  p  and  Ai  =  p, 
as  detailed  below.  Note  that,  if  p  >  h  the  resulting  random  image  occ${/o,/i}  will 
be  more  white  than  black;  flatness  does  not  mean  that  each  label  is  equally  likely, 
but  rather  that  the  chance  of  being  white  at  any  given  pixel  location  is  the  same  as 
at  any  other  location.  These  concepts  in  hand,  we  present  one  of  our  main  results, 
which  formally  claims  that,  on  average,  the  local  histograms  of  composite  images 
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produced  from  flat  occlusion  models  are  but  mixtures  of  the  local  histograms  of  the 
source  images: 

Theorem  4.  //<£>  is  flat  as  in  (16),  then  the  expected  value  (9)  of  the  local  histogram 
transform  (1)  of  a  composite  image  (7)  is  a  convex  combination  of  the  local  histograms 
of  each  individual  image: 

N- 1 

F^(^)(LB-^OCCAfn}nflo)(X,y)  =  A”(LH  wfn)(x,y).  (17) 

n=0 

Proof.  If  <h  is  flat,  l$(x  +  x',n)  =  l$(a;,n)  for  all  x,x'  G  X.  The  error  bound 
in  Theorem  3  then  gives  £  =  0.  Denoting  1  <s>(x,n)  as  A„,  in  (10)  thus  yields  our 
claim.  □ 

Therefore,  when  $  is  flat,  (10)  simplifies  to  (17),  and  so  the  in-depth  compu¬ 
tation  of  Figure  5  can  be  replaced  by  the  much  simpler  one  depicted  in  Figure  6. 
Thus,  flatness  is  indeed  an  important  theoretical  assumption  for  the  analysis  of  local 
histograms  of  textures  generated  via  random  occlusions.  It  nevertheless  remains  to 
be  shown  that  flatness  is  also  a  realistic  assumption  from  the  point  of  view  of  our 
motivating  application. 

We  now  provide  a  variety  of  methods  for  constructing  flat  occlusion  models.  Some 
of  these  models  produce  textures  similar  to  those  encountered  in  digital  microscope 
images  of  histological  tissues. 

Theorem  5.  If  &  is  a  Bernoulli  random  variable,  then  <L  is  flat. 

Proof.  Let  K  =  \X\  and  claim  that  for  any  F  :  Tflfl  — >  M,  we  have: 

N- 1  N—l 

£  nv)  =  £■■■£  £  nv).  (is) 

1U=0  nK= o  pez* 

i p{xk)=nk,\tk 
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Figure  6.  A  continuation  of  the  example  of  Figure  5.  When  the  occlusion  model  $  is  flat,  Theorem  3 
becomes  Theorem  4,  with  (10)  simplifying  to  (17).  Though  each  of  the  16  distinct  composite  images 
shown  in  Figure  5  has  a  distinct  local  histogram  transform,  the  average  of  these  16  local  histogram 
transforms  with  respect  to  P$  is  but  a  convex  combination  (right)  of  the  local  histograms  (center)  of 
the  two  source  images  (left).  That  is,  when  $  is  flat,  the  average-over-all-composites  local  histogram 
computed  in  Figure  5  is  equal  to  the  average-over-all-sources  local  histogram  computed  above. 


To  show  this,  we  enumerate  X  as  X  =  {xk\k=i  and  note  that  for  any  Xk,  we  have 
Snfc=o  1Axk,  nk)  =  1  thus  giving  our  claim: 

K  N—l 

F^  =  mUE  1  IT'k) 

<p£Z%  <f£Z*  k= 1  nk= 0 

N- 1  N-l 

=  E  F (*o  E 1  ip(x1,  m)  •••  y:  1  <p(xK,  nK) 

i p£Z*  n  i=0  nK= 0 

N-l  N-l  I< 

=  F (^)  II 

ni=0  nK= 0  tf£Z*  k=  1 

N-l  N-l 

=  F^)- 
ni=0  nK= 0  <f£Z* 

< p(xk)=nk,Vk 

To  show  <f>  is  flat,  fix  some  xko  G  X  and  note  that  enumerating  X  gives  that: 

I< 

P(j>((/?)  —  |  Pip(x)  —  J_  J_  Pp>{xk)- 

x£X  k= 1 
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Using  this,  and  letting  F(p)  = 

P*Ml  v{xkopn) 

in  our  claim  (18),  gives: 

TV-1 

N-l 

X  P^Ml v(xk0,n) 

=  £■■■ 

£ 

X  p#(^i^k,?i) 

‘ P^N 

nj=0 

«jf=o  c^ez* 

<p(xk)=nk,\/k 

TV-1 

N-l 

K 

=  £- 

£ 

X!  n)- 

ni=0 

™if=0  c^eZ*  fc=l 

ip(xk)=nk,Vk 

Note  that  since  p(xk)  =  nk  for  all  k,  we 

have  p^)  =  pnk  for  all  fc,  thus  giving: 

TV-1 

TV-1 

I< 

>  =  £■ 

"  £ 

x  rw,(^) 

ni=0 

nK=0 

v>ez*  fc=i 

ip(xk)=nk,Vk 
N- 1  TV— 1 

=  'y  ^  Pm  •  •  •  'y  ^  PnK  ^  ]  ltp(xk0i  ri). 

ni=0  nK=0  <^EZ* 

tp(xk)=nkyk 

Here,  we  note  that  for  the  sum  over  p  G  Z^,  the  constraints  that  p(xk)  =  nk  for  all 
k  uniquely  define  a  single  ip.  Therefore,  this  sum  is  one  if  n  =  nko'. 

N-l  TV- 1 

y  ^  p  ®((p)]-<p(xk0,,n) =  y  ^  Pm  ■  ■  ■  y  ^  pnK^ninko) 

m=0  nif =0 

TV-1  s  x  TV-1 

Pnfco^n('^fc0)  )  f  X!  Pnk 

71 /pq — 0  }z^L}zq  Tlfc — 0 

=  Pn-  n 

Next,  we  show  that  an  occlusion  model  <f>  is  flat  if  it  is  translation-invariant , 
meaning  that  its  probability  density  function  P$  satisfies: 

P$(T"cy?)  =  P<j,(<p),  \/p  G  Z^-,x  G  X.  (19) 
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Note  that,  if  we  consider  the  value  of  ip  at  each  given  a:  £  if  as  a  Z^y- valued  random 
variable,  translation-invariance  corresponds  to  the  associated  stochastic  process  being 
stationary. 

Theorem  6.  If  <f>  is  translation-invariant  (19),  then  <f>  is  flat  (16). 

Proof.  To  show  that  <f>  is  flat  (16),  we  must  show  that  for  any  n  e  Zjy,  1  (x,n)  is 
constant  with  respect  to  x.  Using  (8),  and  noting  that  the  definition  of  the  translation 
operator  gives  that  (p(x)  =  TXo~xip(xo)  for  any  xQ  G  X,  we  have: 

T (x,n)=  Y  P*(<P)  =  Y  P*M- 

tp(x)=n  TxO~xip(xo)=n 

Now,  making  a  change  of  variables  by  letting  if  =  T x°~xip:  and  using  our  assumption 
that  <f>  is  translation-invariant  (19)  gives: 

U(x,n)  =  Y  P^T*"^)  =  P *W0  =  U(®o,n).  □ 

Tx-xotp&zfr  iieis* 

%jj(xo)=n  il>(xo)=n 

Theorem  6  indicates  that  flatness  is  not  too  strong  of  an  assumption.  Indeed,  one 
method  for  producing  a  flat  model  is  to  generalize  the  coin-flipping  example  given 
in  the  introduction:  given  any  random  method  for  picking  a  number  from  Z^r — a 
probability  spinner — produce  ip  by  conducting  \X\  independent  spins.  The  resulting 
model  is  translation-invariant,  and  therefore  flat,  since  P<j>(9?)  is  solely  determined 
by  the  number  of  times  that  ip  achieves  each  given  value  n.  Note  this  implies  that 
the  Bernoulli  model  is  flat,  a  fact  we  independently  observed  in  Theorem  5.  Other 
translation-invariant  examples  abound.  For  instance,  for  any  fixed  </?0,  we  can  assign 
equal  probability  to  (p0  and  each  of  its  translations,  and  assign  probability  0  to  all 
others;  if  the  source  images  {fn}n=o  are  constant,  the  composite  images  (7)  produced 
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by  such  a  model  are  all  translates  of  a  single  image.  More  generally,  we  can  always 
partition  the  N^x\  elements  of  into  translation-invariant  equivalence  classes  and 
assign  any  fixed  probability  to  the  members  of  each  class,  provided  we  ensure  that 
in  the  end  they  all  sum  to  one.  For  example,  for  the  case  N  —  2  and  X  =  Z2  ©  Z2 
depicted  in  Figure  5,  we  may  partition  the  16  possible  <^’s  into  7  such  classes,  and  pick 
any  probabilities  {pfc}£L0  such  that  Pi  =  P2  =  P3  =  Pu  Ps  =  Pe,  P7  =  Ps,  P9  =  P10, 
P11  =  P12  =  P13  =  P14.  Armed  with  one  general  method — translation-invariance — for 
producing  flat  models  <f>,  we  now  turn  to  ways  of  combining  known  models  to  produce 
more  complicated  and  realistic  ones. 

3.3.1  Expansion. 

Digital  microscope  images  of  histological  tissues  often  contain  randomly  distributed 
blobs.  These  blobs  correspond  to  biological  structures:  cells,  nuclei,  etc.  The  nature 
of  these  processes  guarantees  that  the  distribution  of  such  structures  is  roughly  uni¬ 
form,  both  spatially  and  in  terms  of  color:  two  cells  cannot  occupy  the  same  space; 
cells  will  usually  grow  and  reproduce  so  as  to  occupy  any  empty  space;  cells  in  a 
given  tissue  all  have  approximately  the  same  size  and  color  patterns.  We  want  to 
construct  flat  occlusion  models  that  emulate  such  textures,  since  in  light  of  Theo¬ 
rem  4,  doing  so  would  formally  justify  the  decomposition  of  local  histograms  as  part 
of  a  segmentation-and-classification  algorithm.  Note  that  there  is  a  natural  method 
for  randomly  generating  a  set  of  roughly  uniformly-distributed  points:  flip  a  coin  at 
each  point  x.  Here,  we  explore  the  idea  of  expanding  each  of  these  randomly  generated 
points  into  a  given  blob. 

To  be  precise,  let  ip  G  Z*  indicate  a  set  of  randomly  generated  points.  For  each 
of  the  points  x  G  X  for  which  ip(x )  =  1,  we  will  replace  it  with  a  blob  whose  shape 
is  indicated  by  some  G  Z*.  The  new  texture  will  be  the  union  of  all  such  blobs. 


39 


Formally,  given  any  p  G  Z*  and  {if)x}xex  G  [Z*]*,  we  define  the  expansion  of  p  by 
{ipx}xex  to  be  p  *  {i>x}xex  e  Z^, 


(<^*{r/v}vev)0£) 


1,  X  =  x'  +  x",  </?(x7)  =  1,  tpx'ix")  —  1, 
0,  else. 


(20) 


Two  examples  of  this  expansion  operation  are  given  in  Figure  7.  Note  that  expansion 
itself  (20)  is  not  an  occlusion  model.  Indeed,  (20)  is  but  a  way  of  combining  functions 
in  Jj2  to  produce  other  ones,  whereas  an  occlusion  model  is  a  random  variable  <f> 
defined  by  a  probability  density  function  P$  over  7L* ■  This  fact  notwithstanding, 
the  expansion  operation  (20)  on  label  functions  p  and  {if ’X}xex  does  in  fact  induce  a 
parallel  operation  on  their  random  variable  cousins  <f>  and  T.  To  be  precise,  given  two 
occlusion  models  <f>  and  T  from  X  into  Z2,  we  define  the  expansion  of  <f>  by  T  to  be 
the  occlusion  model  whose  probability  density  function  is  :  Z*  — >  [0, 1], 


n  P*(Vh).  (21) 

(f€.Zj2  xdX 

{■tpx}xex£[Z$]x 
<fi*{lpx}x£X=<J 

Note  that  the  probability  that  <f>  *  T  will  produce  a  given  label  function  o  depends 
on  the  ways  in  which  a  can  be  written  as  </2*{t/h}a;eA’  and,  moreover,  the  probability 
that  <f>  and  T  will  produce  those  particular  <^’s  and  ^x’s,  respectively.  In  the  next 
result,  we  verify  that  (21)  indeed  defines  a  probability  density  function  on  Z^.  We 
further  show  that  if  $  is  translation-invariant  (19),  then  is  translation-invariant 
which  implies  that  <f>  *  T  is  flat  by  Theorem  6.  In  particular,  image  models  which 
produce  collections  of  blobs  similar  to  those  found  in  biological  tissues  will  indeed  be 
flat  provided  the  distribution  that  produces  the  “centers”  of  these  blobs  is  translation- 
invariant.  Moreover,  if  the  flatness  of  <f>  *  T  is  all  that  is  desired,  we  can  weaken  the 


40 


Figure  7.  Examples  of  the  expansion  operation  (20),  where  black  denotes  the  value  of  1,  and  the 
lighter  shade  denotes  the  value  of  0.  A  function  ip  :  X  — >  {0, 1}  is  given  in  (a),  and  can  be  chosen, 
for  example,  via  a  sequence  of  \X\  independent  coin  flips.  Meanwhile,  for  each  a:  €  A,  we  pick  a 
corresponding  function  xpx  :  X  — »  {0, 1}.  Cropped  versions  of  a  few  examples  of  such  i/^’s  are  given 
in  (b).The  expansion  <p  *  {4>x}xex  of  p  by  {ipx}x£X  is  given  in  (c).  Essentially,  each  point  x  for 
which  p(x )  =  1  is  replaced  with  the  corresponding  blob  'ij)x,  with  the  origin  of  the  ipx  coordinates 
being  translated  to  x.  In  the  second  row,  (f)  shows  the  expansion  of  a  second  set  of  points  <p'  by 
a  second  set  of  blobs  {‘4)'x}x^x-  These  examples  notwithstanding,  note  that  (20)  does  not  require 
these  blobs  to  be  disjoint.  We  could  have,  for  instance,  produced  a  texture  by  expanding  the  points 
in  (d)  by  the  blobs  in  (b).  Nevertheless,  stronger  conclusions  can  be  made  if  such  disjointness  is 
enforced;  see  Theorem  7. 
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requirement  that  $  be  translation-invariant  so  as  to  only  require  that  d>  is  itself  flat, 
provided  d>  and  d/  are  effectively  disjoint : 

If  P$(<^)  >  0  and  P<i»('0a:)  >  0  for  all  x  G  X,  then  ip-k  {ifx}x&x  =  Xxifx.  (22) 

rcEA’ 

v(x)= 1 

Put  another  way,  (22)  means  that  there  is  only  at  most  one  way,  with  nontrivial 
probability,  in  which  the  x  in  (20)  can  be  written  as  x  =  x'  +  x"  where  both  c p(x ')  =  1 
and  ifx>{x")  =  1. 

Theorem  7.  If  <f>  and  d'  are  occlusion  models  from  X  into  Z2,  then  their  expansion 
d>*d>,  with  probability  density  function  (21),  is  as  well.  Moreover,  if  $  is  translation- 
invariant  (19)  ,  then  d>  ★  di  is  translation-invariant.  Furthermore,  if  d>  and  d'  are 
effectively  disjoint  (22)  and  either  d>  or  d'  is  flat  (16),  then  d>  *  d'  is  flat. 

Proof.  We  first  show  that  (21)  defines  a  probability  density  function,  namely  that 
values  of  P$*^(er)  over  all  a  in  7Lf  sum  to  1.  Since  P$  is  a  probability  density 
function  by  assumption,  we  have: 

i  =  5]  p»(«a  (23) 

Similarly,  for  any  fixed  x  G  X ,  we  have: 

1=  p*(vg,  (24) 

where  the  subscript  “x”  on  if  indicates  that  this  particular  if  is  intended  to  expand 
ip  at  the  particular  point  x  as  opposed  to  at  some  other  point.  Taking  the  product 
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of  (23)  with  the  product  of  (24)  over  all  x  yields: 


i = i(i)'-v' = p»(¥>)  n  x  p*(«  =  e  p*(^)  np*(«>  <25> 

x£X  x£X 

{M*extp?]x 

where  the  final  quantity  in  (25)  contains  all  of  the  cross  terms  resulting  from  dis¬ 
tributing  the  product  over  all  sums  of  the  form  (24).  Now,  since  for  each  choice  of  p 
and  {i^x}x&x  there  is  exactly  one  resulting  a  =  tp  *  {/ipx}xeX,  we  can  rewrite  (25)  in 
terms  of  the  definition  (21)  of  P$*^,  obtaining  our  claim: 

i = e  e  p*(v)  n  p*wj = e  p*.*<ff)- 

a£^2  x&X 

{i’x} xCX  s[z^]A 

<P*{'l>x}xeX=<r 

Thus,  (21)  indeed  defines  a  probability  density  function,  as  claimed. 

We  next  show  that  the  occlusion  model  $  *  T  is  translation-invariant,  if  <f>  is 
translation-invariant.  To  do  this,  we  first  claim  that  if  Txcr  =  (p  *  {i/)x}xeX,  then 
cr  =  (X~xp)  -k  {V’x+xjzeAr-  To  see  this  claim,  note  that 

a(x  -x)  =  (Txa)(x)  =  (^*{^'W)(i)  =  1 

if  and  only  if  there  exists  some  x',  x"  in  X  such  that  x  =  x'  +  x ",  <p(x')  =  1, 
and  ijjx'ix")  =  1.  Letting  x  =  x  —  x,  we  thus  have  that  a(x)  =  1  if  and  only 
if  x  —  (x'  —  x)  +  x",  where  ( T~xp)(x '  —  x)  —  ip{x'  —  x  +  x)  =  (p(x')  =  1  and 
i/j(x>-x)+x{x")  =  ipx'(x")  =  1,  implying  a  =  (J~xp) -k  {^x+x}x&x,  as  claimed.  Having 
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the  claim,  (21)  implies: 


p***(t*<7)  =  y,  p$(<^)  n  p^(^) 

x€lX 

{i’x}x  ex£[Zx]x 

ip*{ipxjxex=Txcr 


xi  p.Mnp.w. 


tp£%  2  x£X 

{4’x}x£X^[^2^X 

(T -X<p)*{4>x+s}xex=<r 


To  continue,  we  make  the  change  of  variables  p'  :=  T  xp  and  ij)'x  :=  'ipx+x: 


p^fn-r)  =  ^  p»(t>')  n  p*(^-*)- 

v'ez$  xex 

Wx}xex&[^]x 
(¥>')*{  ^'x}xeX=cr 


Since  $  is  translation-invariant  and  n  p*(^-.) = n  P'i'(V4),  we  have: 

x£X  x£X 

P^(TV)  =  Y  P*(p')  I]  =  P***(^)> 

^EZ^  x£X 

Wxexeiqf]* 

(v')*{iPx}xex=(r 

and  so  <F  *  T  is  indeed  translation- invariant  (19),  as  claimed. 

For  our  final  claim,  we  assume  that  <3>  and  T  are  effectively  disjoint  (22)  and 
that  either  $  or  f  is  flat.  To  do  so,  it  is  helpful  to  characterize  the  flatness  of  an 
arbitrary  occlusion  model  $  from  X  to  Z2  in  terms  of  the  corresponding  function 
<F  :=  Y^ipezg  Indeed,  for  any  p  from  X  to  Z2,  (2)  may  be  rewritten  as 

lp(x,  1)  =  p(x)  and  so: 


MM)  =  Y  PM)MM)  =  Y  PM)M)  =  $(M  (26) 

In  light  of  (26),  we  claim  that  <F  is  flat  if  and  only  if  $  is  constant.  Indeed,  if  <F  is 
flat,  then  there  exists  Ai  such  that  <F(a;)  =  l$(x,  1)  =  Ai  for  all  x  G  X.  Conversely,  if 
<F(a;)  is  constant,  then  there  exists  Ai  such  that  l$(a;,  1)  =  <&(#)  =  Ai  for  all  x  G  X\ 
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by  (11),  this  further  implies  that  l$(:r,  0)  =  1  —  l$(a;,  1)  =  1  —  Ai  for  all  x  G  X  and 
so  <f>  is  flat. 


Having  this  claim,  we  show  that  <f>  *  T  is  flat  by  showing  that  <f>  *  T  is  constant. 
To  do  this,  we  show  that  if  <f>  and  T  are  effectively  disjoint  then  <f>  *  T  =  <f>  *  T  where 
denotes  standard  convolution  over  X.  According  to  the  definition  of  <f>  *  T  (21) 
we  have: 


$  *  ^  =  X]  p<j-^(°')(T  =  X] 


(tEZ^ 


(<P*{ipx}xex)-  (27) 


<pez-2 
Wx  }xex£[z%}x 


Since  any  particular  choice  of  <p  and  {ipx}xex  produces  a  unique  a  via  *  we  can 
simplify  (27)  to 


<f>*  T  = 


(f*  {’/’xirex)- 


(28) 


V?EZ: 

{4’x}xex^[^2^x 


x'&X 

<f(x')=l 


Moreover,  since  <f>  and  4/  are  effectively  disjoint  (22)  we  have  <p*{i/)x}xex  =  Xx  Vh 
meaning  (28)  becomes: 


e  p*(»’)  ( n  p*<^) )  (  e  rx,’ ) 

\xex  )  \  x'ex  / 

{M*exe\tif]x  v[x’)=l 


<h  *  T  = 


E  p.m  E  T*' 


x'ex 

<p(x')=l 


yi  n^)k 


(29) 


Now,  for  any  hxed  x'  G  X  such  that  <p(x')  =  1,  we  factor  the  corresponding  innermost 
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sum  in  (29)  into  a  product  of  A  distinct  sums — one  for  each  x  G  A — to  obtain: 


e  n  p*C0z)  jVv  =  n  e  ~Py{ipx)  I 

cveBfl^  \x£X  )  _x^x'  Vi/j / 


7^'  \ipxez$ 

ruV 

x^x'  / 


y  p ^('tpx')^ 

i>x>£%2 


=  T. 


(30) 


Substituting  (30)  into  (29)  then  gives: 


p*(^)  y  T‘T  'I' 


i p£Z% 


x'&X 

<p(x')=l 


^P.M  E  M** 

V  x'ex  / 


v(x’)=i 


=  I>  *  T. 


p ${(p)(p  I  *  vi; 

SSZ?  / 


Thus,  the  effective  disjointness  of  d>  and  d'  indeed  implies  d>  *  d'  =  d>  *  dG  As  such,  if 
we  further  assume  that  either  d>  or  d'  is  flat,  then  either  d>  or  d'  is  constant,  implying, 
in  either  case,  that  d>  *  T  is  constant  and  so  d>  *  d'  is  flat.  □ 

3.3.2  Overlay. 

Above,  we  discussed  how  the  expansion  (21)  of  a  binary-valued  occlusion  model 
d>  with  another  such  model  d'  is  a  new  model  d^d1  that  randomly  generates  label 
functions  of  the  form  a  =  ip-k  as  defined  in  (20).  Under  certain  hypotheses, 

Theorem  7  gives  that  such  models  d>*d/  are  flat,  meaning  their  local  histograms  can  be 
understood  in  terms  of  Theorem  4.  Moreover,  some  examples  of  these  models  produce 
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textures  that  resemble  those  encountered  in  histological  tissues:  if  f0  and  fi  are 
roughly  constant  light  purple  and  dark  purple  fields,  respectively,  then  the  composite 
image  occv{/o,/i}  obtained  by  picking  tp  as  in  Figure  7(f)  bears  some  similarity  to 
an  actual  image  of  cartilage,  such  as  the  one  given  in  Figure  3(b).  Taken  together, 
these  facts  provide  some  theoretical  justification  for  the  use  of  local  histograms  for 
the  analysis  of  such  tissues. 

There  is  however  a  deficit  with  this  theory:  due  to  the  nature  of  the  construc¬ 
tion  (20),  models  produced  by  expansion  (21)  can  only  be  binary- valued,  and  as  such 
are  insufficient  to  emulate  textures  that  exhibit  three  or  more  distinct  color  modes, 
such  as  the  pseudovascular  tissue  depicted  in  Figure  3(d).  In  this  subsection,  we 
discuss  a  method  for  laying  one  occlusion  model  over  another  which,  amongst  other 
things,  permits  us  to  build  multivalued  models  out  of  binary-valued  ones.  To  be  pre¬ 
cise,  for  any  tp  G  Zjy  ,  0  G  Z^2  and  a  G  Z*,  we  define  the  overlay  of  ip  over  0  with 
respect  to  a  to  be  (pHza'^  £  Z^i+iV2, 


O#<r0)O) 


ip(x),  <t(x)  =  0, 

ip(x)  +  Nx,  cr(x)  =  1. 


(31) 


Essentially,  an  overlay  (31)  is  the  result  of  cutting  holes  out  of  an  image  of  tp  and 
laying  it  on  top  of  an  image  of  0;  the  location  of  these  holes  is  indicated  by  a  and 
the  values  of  0  are  increased  by  a  factor  of  N\  so  that  they  cannot  be  confused  with 
those  of  tp.  Examples  of  this  overlay  operation  are  given  in  Figure  8. 

In  a  manner  similar  to  the  relationship  between  (20)  and  (21),  we  have  that  (31) 
naturally  induces  a  parallel  operation  on  occlusion  models:  given  probability  density 
functions  P$,  and  Ps  on  Zjy  ,  Zjy  and  Z*,  respectively,  we  define  the  overlay 
of  the  occlusion  model  <f>  over  T  with  respect  to  E  to  be  the  new  occlusion  model 
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•«  _ 

♦** 


"<*■•}  ‘i,  ?' 

vv  ♦•ev©» 


(a)  CT 


te#  tV#i  # 

(b)  0#CTer' 


Figure  8.  Two  examples  of  the  overlay  operation  (31).  Recall  the  two  {0,  l}-valued  label  functions 
<t  =  ip  *  {4>x}xgx  and  o’  =  ip'  *  {tl>'x}x&x  of  Figure  7(c)  and  (f)  reshown  here  in  (a)  and  (c), 
respectively.  Further  consider  a  constant  function  0  :  X  — >  Zi  that  assigns  label  0  to  every  point 
in  X.  The  overlay  (31)  of  0  over  a'  is  given  in  (b);  essentially,  er-shaped  holes  are  cut  from  0  and 
the  result  is  laid  over  cr',  resulting  in  a  new  texture.  A  distinct  texture  can  be  produced  by  cutting 
cr'-shaped  holes  out  from  a  and  laying  the  result  over  the  constant  function  0  (d).  Overlaying  the 
resulting  textures  with  each  other  can  produce  even  more  complicated  textures. 
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whose  probability  density  function  is  P$#E^  :  +JV  — »  [0, 1], 


Ep  *(^)P*(V0PE(cr).  (32) 

crEZ^ 


In  the  next  result,  we  verify  that  (32)  indeed  defines  a  probability  density  function, 
and  moreover  that  the  corresponding  model  is  flat  provided  <f>,  \P  and  E  are 

flat,  meaning  that  the  local  histograms  (1)  of  composite  images  (7)  produced  by  such 
a  model  will  behave  according  to  Theorem  4. 


Theorem  8.  //$,  T  and  E  are  occlusion  models  on  Z^<(> ,  Z^  and  Iflfi ,  respectively, 
then  (32)  defines  a  probability  density  function  on  ^%<P+Nxll ■  Moreover,  if&,  T,  and 
E  are  flat,  then  is  flat. 


Proof.  To  show  that  (32)  defines  a  probability  density  function  on  2$  +JV  ,  note  that: 


i  =  (i)(ixi)  =  xi  p*(v>)  E  p*w  E  p^w  =  E  p*(v)p*wpeW-  (33) 

crEZ^ 


Noting  that  for  each  fixed  99,  and  a,  there  exists  exactly  one  u  G  Z^,i>+7V>1<  such 
that  (pffo Tfi  =  v,  (33)  becomes: 


1  =  E  E  p*(^)p*(v>)ps(ff)  =  E 

V£ZN$+Ny  PeZN#  VeZN^+N^ 

ctEZ^" 


as  claimed.  For  the  second  conclusion,  assume  that  <f>,  T,  and  E  are  flat.  Our  goal  is 
to  show  that  <h#s'h  is  flat  (16),  meaning  that  for  any  n  e  Z^+at^,  we  want  to  show 
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that  there  exists  a  scalar  Xn  such  that: 


7  , 

v£Zn^+nxs, 

v(x)=n 


V 


—  /Xt" 


for  all  x  e  X.  To  see  this,  note  that  for  any  such  x  and  n,  we  have: 


(34) 


y  ^  p^#E^,(^o 

=  E 

J2  p$(^)p^(^)ps(u 

t'eZ^S+JV^ 

W£ZiVs+iV^ 

v(x)=n 

v(x)=n 

V>ez^ 

-  V  PtWPvWPzfr).  (35) 

crGZ^ 

(<P#*'*l>)(x)=n 


Now,  in  the  special  case  where  n  —  0, . . iV$  —  1,  (31)  gives  that  ((p^ffa/ip)(x)  =  n  if 
and  only  if  cp(x)  =  n  and  a(x)  =  0.  As  such,  in  this  case  (35)  becomes: 


y  ^  p##^  (t) 


VeZiVS+iV^ 

v(x)=n 


P^MP^WPeI 

l^e2Wi^(:t)=n 

erGZ 2  ,c{x)=0 


cr 


S  p*^)  Y1  Psl 


o' 


<f(x)=n 

=  A$nAs,o- 


cr(a:)=0 


(36) 


If,  on  the  other  hand  n  =  JV$, . . . ,  1V$  +  1V^  —  1  then  (31)  gives  that  ((p^fa/ip)(x)  =  n 
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if  and  only  if  i^(x)  —  n  —  N§  and  a[x)  =  1.  In  this  case,  (35)  becomes: 

p,#e*M  =  5]  P»WP»WPEW 

V£ZNi,  +  Nx!! 

v(x)=n  i/igZjy  ,ip(x)=n—N& 

crEZ*  ,<r(x)=l 

=  E  p*w  E  p*«o  E  psW 

v>ez^  aeZ2 

ip(x)=n—N$  a(x)=l 

=  (37) 

Thus,  for  any  x  G  X  we  either  have  (36)  or  (37)  meaning  <h^svh  is  flat  (34),  as 
claimed.  □ 

Up  to  this  point,  we  have  considered  the  appropriateness  of  using  local  histograms 
to  segment  and  classihy  histology  images  like  the  one  in  Figure  1(a).  In  particular,  we 
have  shown  that  local  histograms  have  several  useful  properties:  they  commute  with 
translations  on  X  and  (F,  and  they  distribute,  on  average,  over  occlusions  provided 
that  our  occlusion  model  is  flat.  In  the  next  chapter,  we  characterize  all  transforms 
which  satisfy  these  three  key  properties  of  local  histograms.  In  doing  so,  we  demon¬ 
strate  the  significance  that  local  histograms  have  in  analyzing  textures  commonly 
found  in  histology  images. 
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IV.  Characterizing  Local  Histogram  Transforms 


In  this  chapter,  we  characterize  all  (possibly  nonlinear)  transforms  which  possess 
the  nice  occlusion-handling  properties  of  local  histograms.  Our  main  result  here  is 
Theorem  10,  which  formally  shows  that  these  properties  are  almost  unique  to  local 
histograms.  In  particular,  Theorem  4  gives  that  the  local  histogram  transform  of 
a  random  composite  image  (7)  produced  by  a  flat  occlusion  model  is,  on  aver¬ 
age,  a  convex  combination  of  the  local  histogram  transforms  of  the  source  images 
{ fn}n=o  ■  Here,  the  coefficients  in  this  convex  combination  are  the  values  {Anj^Tg1 
defined  in  (16).  In  the  next  result,  we  show  that  this  is  not  an  accident,  namely  that 
if  any  transform  distributes  on  average  over  occlusions  in  this  manner,  the  resulting 
coefficients  are  necessarily  these  particular  values: 

Theorem  9.  Let  0  be  any  (possibly  nonlinear)  transformation  from  yx  into  any  real 
vector  space.  If  Q(SX)  0(0)  for  some  x  G  X  and  if,  for  some  flat  occlusion  model 
<f>,  there  exists  scalars  {cy}^1  such  that: 

N—l 

E$[0(occ${/r),}(^roL )]  =  5>0(U  (38) 

n= 0 

for  all  { ,/n } n=o  yX ’ ,  then  these  scalars  are  necessarily : 

Cn  =  An  =  22  V  n  =  0, . . . ,  N  —  1.  (39) 

( p{x)=n 

Proof.  Fixing  any  such  flat  <h,  our  assumption  (38)  states: 

N-l 

P*(p)(~y0CCr{fn}nf)  )  =  22  Cne(fn)  (40) 

n= 0 
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for  all  {fn}n=o  in  yX  ■  Note  that  ©(/)  7^  0  for  some  /  G  yx:  since  &(6X)  0(0)  for 

some  x  6  A?,  at  least  one  of  the  selections  /  =  0  or  /  =  Sx  will  suffice.  Letting  fn  =  f 
for  all  n  gives  that  occ^fn}^^  =  f  for  all  £  Zf?.  Noting  that  the  values  P$(<^) 
sum  to  one,  (40)  in  this  special  case  becomes: 

N—l  ,N- 1  s 

e(/)  =  ]T  P*M0(/)  =  J2 c”e</)  =  (  EC”)W)- 

tp£Z*  n=0  '  n= 0  ' 

which,  since  ©(/)  7^  0,  necessarily  implies  Jf^Io  Cn  =  1-  Having  this  fact,  we  again 
let  the  /„’ s  be  arbitrary,  and  subtract  0(0)  from  (40): 

TV— 1 

J2  P*M[0K{/4S)  -  0(°)]  =  Cn[Q(fn)  -  0(0)].  (41) 

n=° 


Now,  £x  xq  £  X  such  that  0(4Xo)  0(0),  that  is,  @(<5I0)  —  0(0)  7^  0.  Also,  fix  any 


n0  —  0, . . . ,  N  —  1,  and  let 


fn 


3xo )  n  no , 
0,  n  Ho¬ 


rn 


That  is,  for  any  x  £  X\ 


fn(x) 


1,  n  =  no,  x  =  xo, 
0,  else. 


(43) 


We  claim  that: 


occ^I/JOLo1 


3xo  >  ^(xo)  n0, 

0,  (f(x0)  7^  no- 


(44) 
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Note  this  is  equivalent  to  showing  that  for  any  x  £  X, 


U(x)(x)  =  (occ^l  fn}n=o)(x) 


{5X0(x),  tp(x0)  =  n0, 

0,  y(x o)  ^  no, 

{1,  x  =  x0,(p(x0)  =  no, 
0,  else. 


(45) 


For  x  ^  xo,  the  left-hand-side  of  (45)  is  f<p(x)(x)  =  0  by  (43),  while  the  right-hand-side 
of  (45)  is  clearly  zero.  Meanwhile  for  x  =  x0,  evaluating  (43)  at  x  =  x0  and  n  =  (p(x0) 
gives  the  left-hand-side  of  (45)  to  be: 


ftp(x0)  {xo) 


1,  ip(x0)  =  no, 
0,  else, 


which  equals  the  right-hand-side  of  (45)  at  x  =  x0. 

Having  (44),  we  can  decompose  the  left-hand-sum  of  (41)  as: 


P^)le(°CCAfn}n=o)  ~  9(0)] 

=  X  p*£)[©(4„)-*e(0)]  +  Xi  P*(V)[0(O)  -  0(0)] 

VGZ*  <p<EZ% 

tp(x0)=n0  <p(x0)^n0 

=  £  p«.£)[©(4„)  -  e(o)]. 

<p(xo)=no 


Meanwhile,  in  light  of  (42),  the  right-hand-sum  of  (41)  can  be  decomposed  as: 


N-l 

c„ [©(/„)  -  0(0)]  =  cno [G(SX0)  -  0(0)]  +  cn[0(O)  -  0(0)]  =  cno  [Q(SX0)  -  0(0)]. 

n=0  n^riQ 
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That  is,  in  the  special  case  (42),  the  relation  (41)  becomes: 


X  p»(*>)[e(4.)  -  e(o)]  =  c„je(<U  -  e(o)|.  (46) 

ip(x0  )=n0 

Dividing  both  sides  of  (46)  by  Q(SXo)  —  ©(0)  y  0  gives  our  claim  (39).  □ 

In  light  of  Theorem  9,  we  therefore  only  seek  to  characterize  those  transforms  0 
which  satisfy  E<j,[@(occ${/n}Or=r01)]  =  )Cn=o  ^n©(./n)-  Though  a  complete  character¬ 
ization  of  such  transforms  remains  elusive,  we  are  able  to  characterize  them  in  the 
special  case  where  the  codomain  of  0  is  and  0  is  further  assumed  to  commute 

with  translations  in  X  and  y\  see  Theorem  10  below  for  a  precise  statement.  That  is, 
we  show  that  if  0  transforms  images  /  :  X  — >■  y  into  data  cubes  @(/)  :  X  ©  y  — >  M 
in  a  way  that  distributes  over  occlusions  and  commutes  with  translation,  then  0  is 
necessarily  a  filtering  operation  on  the  graph  (2)  of  /.  The  proof  of  this  fact  uses 
discrete  Fourier  transforms  (DFTs)  over  the  finite  abelian  groups  X  and  y  of  pixel 
locations  and  pixel  values,  respectively. 

To  be  precise,  the  Fundamental  Theorem  of  Finite  Abelian  Groups  gives  that  y 
can  be  written  as  a  direct  sum  of  cyclic  groups: 

K 

y  ■■=  ©  %Mk  =  {(BtiV(k)  :  y(k)  G  ZA4  V  k  =  1, . . . ,  K},  (47) 

k= 1 

that  is,  y  :=  (t/(1),  . . .  ,y(K)).  In  8-bit  red-green-blue  (RGB)  images  for  example,  we 
have  y  =  Z'256  ©  Z256  ©  Z256,  that  is,  K  =  3  and  Mi  =  M2  =  M3  =  256.  The  DFT 
over  y  is  Fy  :  CA  — >  Cy, 


(F yg)(y)  :=  5]  gfe'lT2"'”'”'1, 
y'&y 
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where  the  (nonstandard)  dot  product  of  y,y'  G  y  is  defined  as: 


V  •  11 


v-  y{k)y\k) 

^  Mk 

k= 1  K 


(48) 


Note  since  each  value  y[k )  is  only  unique  modulo  Mk,  the  value  of  the  dot  product  (48) 
is  only  unique  modulo  Z.  That  is,  this  dot  product  is  not  a  unique  real  number,  but 
is  rather  an  equivalence  class  of  all  integer  translations  of  that  number.  We  define 
the  DFT  over  X  similarly.  Taking  the  tensor  product  of  these  two  DFT’s  yields  the 
DFT  over  X  ©  y,  namely  the  operator: 


(F x@yW)(x,y)  =  W(x\y’)e-2^x'Wy^.  (49) 

x'&X  y'ey 

These  facts  in  hand,  we  present  the  main  result  of  this  chapter. 

Theorem  10.  Let  0  be  any  (possibly  nonlinear)  transformation  from  yx  into 
Then  0  is  of  the  form  @(/)  =  If  *W  for  some  W  G  M*®-^  if  and  only  if  0  satisfies 
the  following  three  properties: 

(i)  0  commutes  with  translation  on  X:  @(T xf)  =  T (:r,o^0(/), 

(ii)  0  commutes  with  translation  on  y :  0(/  +  yl)  =  T(o,y^0(/), 

(in)  on  average,  0  distributes  over  all  flat  occlusion  models  <h: 

N—l 

Etiefocctt/nto1)]  =  J2  ve(/„).  (50) 

)4=0 

Moreover,  W  is  not  unique:  we  have  If  *  W\  =  1/  *  W2  for  all  f  G  yx  precisely  when 
{Tx®yWi)(x,y)  =  (F x®yW2)(x,y)  whenever  y  fO  or  (x,y)  =  (0,0). 

Proof.  We  begin  by  noting  that  0  is  a  generalization  of  the  local  histogram  transform. 
Indeed,  in  Theorem  1(b)  we  showed  that  the  local  histogram  transform  (1)  can  be 
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computed  as  LH wf  =  1/  *  (w  <g)  <50)  for  any  w  G  M*.  We  generalize  this  by  letting 
©(/)  =  If  *W  where  W  is  any  function  from  X  ®y  into  M;  here,  we  note  that  when 
W  =  w<£> So  this  transformation  is  equal  to  the  local  histogram  transform.  (=^)  Before 
showing  that  (i),  (ii),  and  (iii)  hold,  we  begin  by  noting  that  they  are  generalizations 
of  Proposition  2(b)  and  (c)  and  Theorem  4,  respectively;  although  the  proofs  of  these 
facts  are  straightforward  generalizations  of  the  proofs  of  Proposition  2(b)  and  (c)  and 
Theorem  4,  we  present  them  here  for  the  sake  of  completeness.  To  show  (i),  we  note 
that  It*/  =  T^,0^l/,  implying: 

0(T xf)  =  1T*/  *W  =  (T(x’0)1  /)  *  W  =  T(*’0)(l/  *W)  =  T(*’o)0(/). 
Similarly,  the  fact  that  T^’^l/  =  l/+yi  implies  (ii): 

T  (°’»)0(/)  =  T(0’y)(l/  *W)  =  (T(0’y)lf)  *  W  =  Ij+yi  *W  =  ©(/  +  yl ). 

For  (iii),  let  (x,y)  E  X  ©  y  and  let  $  be  any  flat  occlusion  model.  We  have: 
E^fe^CC^I/n}^1)]^,?/)  =  ^  FMM0CCv{ fn}^)](x,y) 

=  Y  ^(^(W/^-o i*w)(x,y) 

=  YP*W  Y  loCCv{fn}^x'iy')w(x-x'iy-y')- 

(.x’,y')ex®y 
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N-l 

Noting  that  (13)  gives  l^ix'  ,n)lfn{x'  ,y')  =  locc  {/„}J v-i(x',y'),  this  becomes: 

n=0 

E$  [0(occ*{/X=o)]  (x,  y ) 

N-l 

=  P<^(^)  Y  Yl^x^n^lf^x^y'^w^x~x^y ~y') 

(x',y')ex®y  n= 0 

N-l 

=  Y  Y  Y  p^)1Ax^n)1fn(x',y,)w(x-x',y -y')- 

n= o  (x',y')eX®y 

Since  $  is  flat,  (16)  simplifies  this  to: 

N-l 

E*[Q(occ*{fn}%i£)](x,y)  =  Y  xn  Y  1fn(x^y')w(x-x^y~y>) 

n= 0  (x',y')€X®y 

N-l 

n— 0 
N-l 

=  Y  KQ(fn)(x,y). 

n= 0 

(<=)  This  direction  of  the  proof  is  substantially  more  involved  than  its  converse.  We 
first  show  how  (iii)  implies  that  ©(/)  —  ©(0)  distributes  additively  over  the  standard 
coordinate  vectors: 

e(/)  -  0(0)  =  ^[0(/(i)i)  -  0(0)].  (51) 

xex 

Here,  for  any  y  G  y,  y5x  denotes  the  function  from  X  to  y  that  has  value  y  at  x  and 
is  otherwise  zero.  To  prove  that  (51)  holds,  we  first  show  that: 

©(/)  -  0((1  -  Sx)f)  =  ©(/(*)*.)  -  0(0).  (52) 

Note,  we  assume  (50)  holds  for  all  flat  <h  and  all  {fn}n=o  in  yX ■  In  particular,  (50) 
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holds  for  the  specific  image  occlusion  model  <f>  whose  probability  density  function  is: 


P*M  = 


jY\  j  V7  —  for  some  x  G  X, 
0,  else. 


We  claim  this  <f>  is  flat  with  A0  =  1  —  Ai  =  py,  and  \n  =  0  for  all  n  >  2.  To  see 
this,  let  A  =  {8X>  :  x'  e  X},  and  note: 


a„=  J2  p*(v)=  E  p*(^+  E  p«(¥0=  E 


cp(x)=n 


v>eA 

cp(x)=n 


vt  a 

< p(x)=n 


x'ex 

5x,(x)=n 


The  An’s  may  then  be  computed  as: 


p*(40, 

n  =  0, 

x'ex 

x'^x 

p  *(6X), 

n  =  1, 

0, 

n  >  1, 

1  |A”|  ’  ^ 

~  *  — —  T)  —  1 

\x\-r  11 

0,  n  >  1. 

Having  that  the  <f>  defined  in  (53)  is  flat,  we  apply  our  assumption  (50)  to  it  in  a 
special  case.  Specifically,  fixing  any  /  in  yx  and  any  x  G  X,  y  G  y,  letting  fo  =  f 
and  fi  =  y8x ,  the  right-hand-side  of  (50)  is: 


E  A"e</»)  =  0  -  pi)0 (/)  +  Aew 
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Meanwhile,  it  follows  from  (53)  that  the  left-hand-side  of  (50)  is: 


E^occ^/J^o1)]  =  p-i>(¥,)0(occ^{/n}^=o1) 

=  +  5^p®Me(°ccv{/n}^o) 

<pea  ^^a 

=  ^  ©(occ^j/J^o1). 

x'&X 

Using  the  easily  verified  fact  that  occ s  ,{fn}n=o  —  (1  —  4')/o  +  4'/i  gives: 

E,[e(occt{/„}£01)]  =  ^  Xi  9«1  -  M/  +  M?A)) 

x'e* 

=  pE  0((1  -  ix)/  +  ySx)  +'£J  ©((1  -  4-)/)  •  (55) 

L  x'^x  J 

Substituting  (54)  and  (55)  into  (50),  and  multiplying  by  A  gives: 

0((1  -  8x)f  +  y5x)  +  0((!  -  4')/)  =  {\X\-  !)©(/)  +  0(1/4).  (56) 

x'^x 

Rearranging  (56)  so  that  all  “y”  terms  lie  on  the  left-hand-side  yields: 

0((1  -  8x)f  +  y5x)  -  Q(y8x )  =  (|*|  -  1  )0(/)  -  £  0((1  -  40/)-  (57) 

x'^x 

As  the  right- hand-side  of  (57)  is  obviously  constant  with  respect  to  y,  then  the  left- 
hand-side  is  implicitly  so.  In  particular,  the  left-hand-side  of  (57)  is  equal  to  itself  at 
V  =  0: 

0((1  -  4)/  +  ySx)  -  B(y8x)  =  0((1  -  8x)f)  -  0(0).  (58) 
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Now,  letting  y  =  f{x)  in  (58)  gives: 


©(/)  -  Q(f(x)6x)  =  0((1  -  Sx)f)  -  0(0), 


which  rearranged,  gives  onr  claim  (52).  Now,  using  (52)  to  prove  (51),  we  enumerate 
X  as  X  =  ,  and  write  the  left-hand-side  of  (51)  as  a  telescoping  sum: 


1*1  r  /  3~  1 

e(/) - 0(o)  =  E  e 

'  fc=i 


3= 1 


fc=l 


1*1  r  /  J-l  \  /  i-1 

E  e  /Ilf1-'5**)  -0 (a-'U/Ih1-^ 

j= i  L  '  fc=i  '  '  fc=i 


(59) 


Applying  (52)  to  (59)  where  “/”  is  /nl=i(l  —  and  “x”  is  then  yields  (51): 


©(/) 


Al  r  /  /  i-i  \ 

0(o)  =  E 

i=i  L  '  '  k=  i  ' 

1*1 

=  EW(v)<M-  0(°)] 
1=1 


=  E[e(/wy-e(0)]’ 

xex 


0(0) 


Having  (51),  we  now  use  it,  along  with  our  assumptions  (i)  and  (ii),  to  show  that 
there  exists  W  E  M*®^  such  that  @(f)  —  If  *W  for  all  /  €  yx .  Indeed,  we  have: 


6(/)  -  6(0)  =  El0(/W«  -  e(°)l 

xex 

=  EiefT'/W*)  -  0(T*o)] 

xex 

=  E  T(’’o)[0(/(x)<So)  -  6(0)]. 

xex 

For  any  fixed  /,  writing  our  sum  over  A  as  a  sum  over  (V  and  X  subject  to  the 
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constraint  that  f(x)  =  y  gives: 


0(f)- 0(0) E  -  0(0)] 


y&y  x£X 
f(x)=y 


EE  l/(a=,t/)T<-'0>[©(?/<50)  -  0(0)]. 

y&y  x£X 


(60) 


We  now  claim  that  there  exists  a  W  G  such  that: 


e(i/'5o)-e(o)  =  (T<0’»>-i)w;  Vyey. 


(61) 


To  be  precise,  we  define  W  in  the  frequency  domain  as: 


(  {Fw[e(i/(S0)  -  e(o)]}(x', -if) 


(F  x&yW)(x',y')  =  { 


3-2 ni(y-y')  _  ^ 


E[e(°)](0.*/). 

y&y 


,  y'  7^  0j 


=  (0,0),  (62) 


v 


0, 


x'^0,y'  =  0, 


where,  for  any  y'  ^  0,  y  is  chosen  such  that  y-y'  ^  Z,  where  this  dot  product  is  defined 
in  (48)  according  to  the  factorization  (47)  of  y  as  a  direct  sum  of  cyclic  groups.  To 
see  that  W  is  well-defined,  we  first  note  that  for  any  y'  ^  0  there  always  exists  at 
least  one  such  y.  Indeed,  since  y'  ^  0,  there  exists  at  least  one  index  k0  such  that 
y'[k 0)  7^  0  6  Picking  y  such  that  y[k)  is  1  when  k  =  ko  and  is  otherwise  zero, 


we  have: 


/  v-  y(k)y\k )  y'(k0 ) 

yy  =Zs  — tf —  =  ~T7  i  z- 


k=  1 


Mk 


M> 


ko 


To  show  that  W  is  well-defined,  we  must  further  show  that  for  y'  ^  0,  the  value  of 
(F  x®yW)(x' ,  y')  does  not  depend  on  the  particular  y  one  chooses  such  that  y  -y'  ^7L. 
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To  do  this,  take  any  y,  y,  y'  e  y  such  that  y  ■  y' ,y  ■  y'  ^  Z.  We  need  to  show  that: 

|F*w[e(y,5o)-e(0)]}(a',i/)  {F,tW[9(^o)  -  0(O)]}(V;/') 

e-2"(s-j')  _  1  e-27ri (y-y')  _  ■]_  '  '  ' 

To  prove  that  (63)  holds,  we  first  show  that: 

@(y8 o  +  j/l)  -  ©(2/1)  -  B(^o)  =  0(yho  +  1/1)  -  ©(2/1)  -  ©(W-  (64) 

To  prove  (64)  holds,  note  (51)  gives: 

0(yho  +  2/1)  -  ©(2/1)  =  [©(2/^0  +  2/1)  -  ©(0)]  -  [0(yl)  -  0(0)] 

=  +  2/l)(^)<5*)  -  ©(O)]  -  ^[©(yhx)  -  0(0)]. 

Noting  that  (y50  +  yl)(x)  equals  y  +  y  when  x  is  zero  and  is  y  otherwise,  this  becomes: 

Q(yS0  +  yl)  -  0(yl) 

=  [6(0  +  y)s0)  -  0(0)]  +  £>(!/«  -  0(0)]  -  £[e(i&)  -  e(o)] 

x^0  x£X 

=  0(0  +  v)S0)  -  0(0)  -  ]©(y<So)  -  0(0)1 

=  0(0  +  y)S0)  -  S(yS0).  (65) 

Interchanging  the  roles  of  y  and  y  in  (65)  gives: 

0(yho  +  1/1)  -  0(2/1)  =  0((y  +  y)S 0)  -  0(j/<5o).  (66) 

Solving  for  0((y  +  y)ho)  hr  (65)  and  (66)  gives: 

©Mo  +  2/1)  -  ©(2/1)  +  6(M  =  0(yS 0  +  yl)  -  0(yl)  +  0(y5o)- 
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Subtracting  Q(y80)  +  Q(y50)  from  this  yields  our  claim  (64).  Next,  by  adding  0(0)  to 
both  sides  of  (64),  and  using  the  assumption  (ii)  that  0  commutes  with  translation 
on  y ,  we  can  write  (64)  as: 

T(o-fi)0(j/<$o)  -  T(°>*>0(O)  -  Q(ySQ)  +  0(0) 

=  T(°>»)0(y5o)  -  T^0(O)  -  Q(yS0)  +  0(0), 


or  more  concisely,  as: 

(T(°^)  _  I)  [Q(yS0)  -  0(0)]  =  -  I)  [0(2/<Jo)  -  0(0)]. 

Taking  Fourier  transforms  then  gives: 

(e-M(SV)  _  l){Fw[e(,j.)  -  e(0)]}n',!,') 

=  -  i){Fw[egi„)  -  e(o)]}(x',t/'), 

for  any  (. x' ,y ')  G  X  0  y.  Since  y  ■  y'  ^  Z  and  y  ■  y  ^  Z,  dividing  gives  (63).  This 
concludes  the  argument  showing  that  W  is  well-defined  by  (62). 

We  now  note  that  W — implicitly  defined  in  (62) — is  real-valued,  as  claimed  in  the 
statement  of  the  result.  Since  W  G  CA®^  is  real- valued  precisely  when  its  Fourier 
transform  is  skew-symmetric  about  the  origin,  it  suffices  to  show  that: 

(F XeyW)(-x',-y')  =  [(F X(ByW){x\y')}\  V  (x',y')  G  *  ©  y.  (67) 

We  recall  our  assumption  that  0  is  any  transformation  from  yx  into  As  such, 

0(0)  is  real- valued,  and  so  (62)  gives  (Fx®yW)(0,  0)  is  real,  meaning  (67)  holds  at 
(x1,  y')  =  (0,  0).  Moreover,  for  y'  =  0,  (62)  gives  both  sides  of  (67)  to  be  0.  Meanwhile, 
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for  y'  y  0,  since  Q(y50)  —  ©(0)  is  real-valued,  its  Fourier  transform  satisfies: 


{F*w[eM  -  9(0)]}(— -y')  =  ({F.w[e(^)  -  0(O)]}(i', a'))-. 


Taking  any  y  such  that  y  ■  y'  ^  Z,  note  we  also  have  y  ■  {—y')  ^  Z  and  so  this  fact, 
along  with  (67),  gives: 


(F  x^yW){-x\-y') 


e-2ni(y-(-y'))  _  l 
(e-27ri (y-y')  _  1)* 
V?x<s,yW)(x',  </)]*. 


Having  that  the  W  defined  (62)  is  real-valued,  we  next  claim  that  W  indeed 
satisfies  (61).  In  the  frequency  domain,  (61)  is  equivalent  to  having: 

{*»«>[©(#*,)  -  0(0)]} (x',y')  =  (e-2'1*”')  _  l)(YX9yW)(x‘ y),  (68) 

for  all  x'  G  X,  y,y'  G  y.  For  y'  G  y  such  that  y  ■  y'  ^  Z,  note  that  (62)  immediately 
gives  (68).  Meanwhile,  for  y'  G  y  such  that  y  ■  y'  E  Z,  the  right-hand-side  of  (68)  is 
necessarily  0,  meaning  we  must  show  that: 

(F^ey[©(|/5o)  -  0(0)]} (a/,?/)  =  0,  V  (a/,  y)'  G  X  ©  y  s.t.  y  ■  y'  G  Z.  (69) 

To  do  this,  note  that  for  any  y  ■  y'  G  Z,  we  have: 

(F^lef!*)  -  0(0)]  }(x',y')  =  J2  £[e(iA)  -  0(O)](i,s/)e-M(*-2,|e-2'®  F) 

x£X  y£y 

x£X  y£y 
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It  thus  suffices  to  prove  that  for  any  fixed  x  e  X, 

5^[0(^o)  “  ©(0)](5^)  =  °-  (70) 

y&y 

Note  that  a  change  of  variables,  along  with  our  assumption  (ii),  gives: 

X][e(^o)  -@(°)](£,y)  =  Y  [®(y8o)  -®(Q)}{x,y -y') 

y^y  (y—y')&y 

=  ^{T(oy)[e(j/50)  -  0(o)]}(*,s) 

y’&y 

=  XleM>  +  !/'l)-e(i/l)](i,S).  (71) 

y’&y 

We  now  recall  (65)  which  gives  the  telescoping  sum: 

Y  [©(^o  +  y'  1)  -  ©(j/l)]  =  Y  +  ^o)  -  0OT))]  =  0.  (72) 

y'ey  y'&y 

Substituting  (72)  into  (71)  gives  (70),  which  in  turn  implies  (69).  Having  that  (68) 
holds  in  all  cases,  we  obtain  (61). 

Having  the  claim  (61),  we  substitute  it  into  (60): 

©(/)  -  0(0)  =  lf(x,y)T^°\T^  -  I )W 

yey  x£X 

=  E£  lf(x,y)(T^W  -T^W).  (73) 

y&y  x£X 

Since  for  any  fixed  x  G  X,  we  have  that  Yhyey^f{x->y)  =  1,  evaluating  (73)  at  some 
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{x',y')  G  X  ©  y  gives: 


[GCfl-eiMV^EE1/  {x,y)[(T^W)(x' ,y')  -  (T(xfi)W)(x' ,  y')\ 

y&y  x£X 

=  (1  f*W)(x',y')-  ZE  1  f(x,y)W(x'  -  x,y') 

y£y  x£X 

=  (lf  *W)(x',y')  -  ^W(x'  -  x,y') 
xex 

=  (1/  »  W)(x’,y')  -  J2(T(mW)(x',  V').  (74) 

x&X 

We  now  claim  that 

e(o)  =  Yi  T  (75) 

X  €lX 

To  see  this,  note  that  both  (XLev  T^,0)W)(a;/,  y')  and  [0(0)]  (  x',  y')  are  constant  with 
respect  to  x'\  for  the  former,  note  (Y^xex  T^^W)^',  y')  =  Y2xex  W(x,  y')\  for  the 
latter,  note  that  for  any  x  e  X,  assumption  (i)  gives: 

[e(o)](x',t/')  =  (9(t*'-*o)](»'>jO  =  {T^-^iefo)]}^',!,')  =  [e(o)](*,jO- 


We  therefore  treat  both  of  these  functions  as  functions  solely  of  y' .  In  particular,  in 
order  to  prove  (75),  it  suffices  to  show  that  it  holds  in  the  frequency  domain  of  W 


[Fj-e(o)](B') 


rT(% 


x£X 


(y% 


Vy’  e  y. 


(76) 


To  show  that  (76)  holds  for  any  y'  ^  0,  pick  y  such  that  y  ■  y'  ^  Z  and  let  f  =  y  1 
in  (74): 

(Ty  -  1)0(0)  =  0(yl)  -  0(0)  =  lyl  *  W  -  J2  T {x’0)W. 

xex 
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Using  the  easily  verified  fact  that  lyl*W  =  YIxgx  T^’^W,  this  becomes: 


(Ty  -  1)0(0)  =  ^{x,v)W  -  T (x'0)W  =  (Ty  -  I)  T{X’0)W. 


Taking  Fourier  transforms  of  this  over  y  gives  that: 


(e"2 "vy'  -  l)[Fy0(O)](r/')  =  (e"2^'  -  1)  \py  (  ^  T{x’0)w\  1  (y').  (77) 

\  /  - 


Since  y  ■  y'  ^  Z,  we  may  divide  by  (e  2my'y'  —  1)  to  obtain  (76)  in  this  case.  In  the 
remaining  case  where  y1  =  0,  the  right-hand-side  of  (76)  becomes: 


Fy(ET(S’°)W)  (°)  =  EE(T(I%)(^  E  W(x,y)  =  (FX(ByW)(0,0). 


y&y  x£X 


(x,y)&X®y 


From  the  definition  (62)  of  W,  this  becomes: 


Fy(ET(*’0)w)  (0)  =  X)[0(o)](y)  =  [Fy©(o)](o), 


namely  (76)  where  y'  =  0.  Therefore,  (76)  is  satisfied  giving  (75).  Substituting  (75) 
into  (74)  gives  our  result  that  0(f)  =  1/  *  W. 

All  that  remains  to  be  shown  is  that  our  choice  of  W  is  not  unique.  To  be  precise, 
the  argument  up  to  this  point  has  only  made  use  of  the  values  of  (62)  where  either 
y'  y  0  or  (x',y')  =  (0,0).  This  suggests  the  values  of  (F  x@yW)(xf  0)  where  i'  /  0 
are,  in  fact,  arbitrary,  even  though  they  were  chosen  to  be  0  in  (62).  This  is  indeed 
the  case:  ifl/*VFi  =  l/*  IIU,  then  0  =  1/*  (IUi  —  Wf)  for  all  /.  Taking  the  Fourier 
transform  over  X  ©  y  gives: 


0  —  (FXeylf)[FXey(Wi  —  W2)], 


(78) 


where: 


(fWW.o)  -  E  «-«'*'*»  £  i/(*,»). 

(x,y)ex®y  x&x  y&y 

Since  for  any  fixed  x,  ^2yey  1  f(x,y)  =  1,  we  have: 

(FTOj,1,)(x',0)  =  =  (F^l)(x')  =  \X\S0(x'), 

x&X 

which  equals  0  for  any  x'  X  0.  In  light  of  (78),  we  thus  have  that  (F  x®yW\){x,  0)  need 
not  be  equal  to  (F  x®yW2){x1  0)  for  x  X  0,  thus  giving  our  result:  If  *  W\  =  1/  *  IT2 
for  all  /  G  yx  precisely  when  (F^eybFiXa;,  y)  =  {¥ x®yW2)(x,  y)  whenever  y  ^  0  or 

(xi  y)  —  (0, 0).  □ 

In  summary,  in  this  chapter,  we  showed  that  local  histograms  have  unique  capa¬ 
bilities  with  regards  to  the  analysis  of  composite  images.  To  be  precise,  we  let  0  be 
any  (possibly  nonlinear)  function  that  transforms  images  in  yx  into  “joint  distribu¬ 
tions”  of  pixel  location  and  value  in  MA’®X  We  showed  that  if  such  a  0  commutes 
with  translations  on  X  and  y  and,  like  local  histograms,  distributes  on  average  over 
occlusions  (7),  then  0  is  necessarily  of  local  histogram  type ,  that  is,  is  of  the  form 
@(/)  =  If  *  W  for  some  W  G  iX®X  In  short,  Theorem  10  establishes  that  such 
operators — ones  that  filter  the  graph  of  an  image  as  opposed  to  the  image  itself — are 
uniquely  well-suited  to  analyze  composite  images  of  a  certain  type. 
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V.  Variance  of  Local  Histogram  Transforms 


5.1  Variance  and  Two-Flatness 

In  the  previous  two  chapters,  we  discussed  how  local  histograms  distribute,  on 
average,  over  flat  occlusion  models.  Having  this  understanding  of  the  average  behavior 
of  local  histograms,  in  this  chapter,  we  focus  on  the  subject  of  how  closely  this  average 
approximates  a  typical  local  histogram.  That  is,  we  compute  the  variance  of  the  local 
histograms  of  a  composite  image  (Theorem  11),  and  demonstrate  that  the  distance 
of  our  local  histograms  from  the  average  is  highly  dependent  on  the  scale  of  the  local 
histogram  window. 

To  compute  this  variance,  we  necessarily  make  several  simplifying  assumptions 
on  the  nature  of  the  random  images  in  question.  In  particular,  we  assume  that  our 
occlusion  model  $  is  two- flat,  meaning  that  on  average,  the  probability  that  $  chooses 
labels  n'  and  n"  at  two  separate  pixel  locations  is  equal  to  the  probability  of  choosing 
n'  and  n"  at  any  other  two  separate  pixel  locations;  formally,  <h  is  two-flat  if  there 
exists  nonnegative  scalars  such  that  for  any  n',n"  and  x',x"  G  X,  x'  x": 

T  P$(V?)1  ^(x'  ,n')lv(x”  ,n”)  =  \n'\n".  (79) 

Note  that  two-flatness  is  a  stronger  form  of  flatness,  that  is,  if  $  is  two-flat  (79),  then 
<h  is  flat  (16).  To  show  this,  we  begin  by  summing  up  (79)  over  all  n"\ 

N—l  N—l 

An'  y  \i"  =  y  p$(<p)i v(x',n')  y  l^-vo  =  y  p$ (8°) 

n"=0  n"=° 

where  the  last  equality  follows  directly  from  the  fact  that  for  any  fixed  tp,  we  have 
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that  Yln"=o  1  ip(x",n")  =  1.  Now,  summing  over  all  n'  gives: 

,N-1  s  2  N—l  N—l  N- 1 

( An' )  =  5Z A"'  An" =  5Z p*(^)  n/) =  x> 

'  n'=0  '  n'=0  n"=0  'P&'fj  n'= 0 

which,  since  the  An/’s  are  nonnegative,  gives  that  Y^n'=o^n'  =  1-  Note  that  in  light 
of  this  fact,  (80)  reduces  to  (16),  namely  that  <f>  is  flat. 

Having  this  property  of  two-flatness,  we  can  express  the  variance  of  the  local 
histograms  of  a  composite  image  as  a  combination  of  the  local  histograms  of  each 
image  with  respect  to  the  square  of  the  weighting  function: 

Theorem  11.  Let  {fn}n=o  b e  any  sequence  of  images  in  yx  which  exhibits  unique 
pixel  values,  that  is,  fn'(x)  =  fn"(x)  implies  n'  =  n" .  Then  for  any  two- flat  N -image 
occlusion  model  and  any  function  w  in  R*,  the  variance  of  the  local  histogram 
transform  (1)  of  a  composite  image  (7)  is 

N- 1 

V*(LHwocc *{fn}”^)(x,y)  =  An(l  -  An)(LH ^fn){x,y).  (81) 

n= 0 

Proof.  We  assume  that  <f>  is  two-flat,  which,  as  noted  above,  implies  that  is  also 
flat.  As  shown  in  Theorem  4  and  generalized  in  Theorem  10,  since  <f>  is  flat,  we  have: 

N—l 

E*(LKwocc*{fn}%I+)(x,y)  =  An(LH wfn){x,y).  (82) 

n=0 

Therefore,  we  can  compute  the  variance  as: 


V$(LH^occ${/„}()r=01 ) 


E<e,[(LHwocc${/41'o1 

E*[(LHwocc*{/X=“o 


-  [E^LH^occ^/XAo)]2 

r  N—l  2 

^  ^  ^n(LHw;/n)(x,  y) 

n= 0 


(83) 
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To  compute  E$[(LHu,occ${/n}^=01)2],  it  follows  from  (1)  that: 

E*[(LHwOCC, ${fn}n=o)2}(x,y) 

=  X  pM(^woccv{fn}^)2(x,y) 

=  Y  P<3>^  X  W(X')5y((OCCMn}n=o)(X  +  X')) 

'-x'ex  -I 

=  Y  Y  W(X’  ~  X)Sv(( OCCMn}n=o)(X'))  ■  (84) 

‘-x'ex  -I 

Recalling  (13),  we  have  that: 

N- 1 

SydoCcMn^oW))  =  Y  ivW’^WnW))’ 

n= 0 

and  so  (84)  becomes: 

E*[(LHwocc  *{fn}n=o)2](x,y) 

r  N—l  1  2 

=  X]  P<1>^)  X  w(x>  -  X  lv(x,’  n,)Sy(fn'(x')) 

'-x'&x  n'= 0 

N- 1  JV— 1 

=  X  X  W(X'  -  X)W(X"  -  x)  X  X  Sv(fn'W))Wn”W)) 

x'SLX  x"£X  n'= 0  n"= 0 

x  X  P$(¥,)lv(^/ yn')lv(x" ,n"). 

<p£Z* 

We  now  break  up  the  sums  over  x',x"  G  X  into  a  sum  where  x'  =  x"  and  a  sum 
where  x'  ^  x",  and  note  that  for  x'  ^  x"  the  sum  over  <p  G  Z^  reduces  to  (79)  since 
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$  is  two-flat: 


E$[(LHwOCC  *{fn}n=o)2](x,y) 

N- 1  N- 1 

=  EM^-^E  E  Wn'(x'))Sy(fn"(x'))  Y  P *{ip)l<p{rf ,n')lv{ri ,n") 
x'£X  n'=0n"=0 

N- 1  JV— 1 

+  E  W(x'  -  x)w(x"  ~  X)  E  E  VAn/'.  (85) 

x'^x"  n'= 0  n"=0 

For  the  first  summand  in  (85),  we  have  that  l¥,(x/,  n/)l¥,(x/,  n")  is  zero  when  n’  ^  n" 
and  is  1  v(x',n')  when  n’  =  n" .  This,  along  with  the  fact  that  <f>  is  flat,  reduces  the 
Erst  term  in  (85)  to: 

N—l  N- 1 

E^®'  _a;)]2  E  E  Wn'W))Wn”W))  Y  P* 

x'ex  n'=0n"=0 

N—l 

=  J^[w(x'  -  x)]2  Y  5v(fn'(x'))  Y  P^EE™') 

x'ex  n'= 0 

N—l 

=  E An'  E  _ 

n'=0  x'£X 
TV— 1 

=  Aw/(LHwa/w>)(g,y).  (86) 

n'=0 

Meanwhile,  we  write  the  second  term  in  (85)  as  the  sums  over  all  x' ,  x"  e  T  minus 
the  sum  over  a/  =  x": 

N- 1  JV-1 

E  _  ~  X)  E  E/nE))ETn"E))An'An" 

x'j^x"  n'= 0  n"=0 

TV— 1  TV-1 

=  E  E  W(X/  _  X)WE  -  x)  Y  E  E/nEOK(/n"E))An'An" 

x'£X  x"£X  n'=0n"=0 

TV— 1  TV-1 

-  E  ~  E  E  ^(/«'(a;/))(5J/(/n''(a;/))An'Ar)».  (87) 

x'£X  n'= 0  n"=0 
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Here,  the  last  term  in  (87)  is  zero  unless  fn'{x')  =  fn"(x')  which,  from  our  assump¬ 
tions,  implies  that  n!  =  n" .  As  such,  (87)  becomes: 


w(x'  — 

x'^x" 


N—l  N—l 


Sy(fn'(x’))5y(fn”(x"))\  n,  \n" 

n'= 0  n"= 0 

N- 1  1  2 

Y  An'  Y  W(X'  ~~  X)Sy(fn'(X'))  ~  Y  X )]2  5v(fn'(X')) Xl' 

n' — 0  x'£X  x'aX  n'— 0 

AT— 1  2  7V-1 

An/(LHw/n/)(ic,2/)  -  An'(LHn,2/n,)(x,7/).  (88) 


N-l 


n'=0 


n'=0 


Substituting  (86)  and  (88)  into  (85),  and  combining  like  terms  gives: 


E$[(LHwocc${/ri,}Or=r01)2](a;,  y) 


N- 1 


r  N-l 


1  2 


^  ^  An'(l  An') (\jHw2  fn1') (:r,  i/)  A  ^  ^  An' (LH^y^j/) (x,  y) 


n'= 0 


■  n'= 0 


Substituting  this  into  (83)  gives  our  result  (81).  □ 

Theorem  11  shows  that  the  variance  of  the  local  histogram  transform  of  a  com¬ 
posite  image  depends  on  the  scale  of  the  support  of  w,  a  fact  which  is  demonstrated 
in  Figure  9.  To  see  this  more  clearly,  consider  an  example  where  y  =  Z m  and 
X  =  7Ll  ©  7Ll.  In  particular,  let  the  /„’ s  be  distinct,  constant  functions,  that  is, 
fn  =  rnnl  where  mn  G  7Lm  and  mn>  =  mn»  if  and  only  if  n!  =  n".  Let  w  G  be  the 
square  weighting  function  defined  by: 


w(h,l2) 


(2J+1)2’  ^1)^2  £  [  Jj  J]i 

0,  else, 


(89) 


where  J  <  L.  Let  pn  denote  the  probability  of  choosing  label  n  G  ZN.  Noting  that 
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Figure  9.  An  example  demonstrating  the  importance  of  the  scale  of  the  local  histogram  window. 
Recall  the  example  given  in  Figure  2.  The  theory  presented  in  this  dissertation  has  provided  the 
theoretical  justification  for  a  certain  type  of  segmentation-and-classification  algorithm.  That  is,  we 
assign  a  label  to  any  given  pixel  in  Figure  2(e)  by  comparing  its  local  histogram  to  the  three  distinct 
distributions  of  pixels  found  in  Figure  2(a),  (b)  and  (c).  In  doing  so,  we  hope  to  obtain  results  that 
compare  well  against  the  ground  truth  image  (a).  In  (b),  we  see  that  the  support  of  the  window  plays 
an  important  role  in  how  accurately  our  algorithm  segments-and-labels  our  image  Figure  2(e) — a 
fact  which  is  justified  in  Theorem  11.  In  this  example,  our  window  is  defined  as  a  box  filter  (89)  of 
“radius”  J.  That  is,  the  box  has  sides  of  length  2  J  +  1;  (b)  gives  the  accuracy  for  this  classification 
for  windows  of  length  2  J  +  1  =  1, . . . ,  31,  that  is,  J  =  0, . . . ,  15.  When  our  window  is  too  small, 
we  get  a  “noisy”  segnrented-and-labeled  image,  such  as  the  one  given  in  (c)  where  2J  +  1  =  3.  In 
(b),  we  see  that  our  classification  accuracy  is  highest  when  2J  +  1  =  7.  Indeed,  using  this  window 
in  our  “nearest  histogram”  classification  scheme  results  in  (d)  which  compares  well  against  ground 
truth  (a).  Finally,  if  we  choose  our  window  to  be  too  large,  our  classification  accuracy  suffers  from 
oversnroothing  as  seen  in  (e)  which  was  obtained  by  letting  2  J  +  1  =  29. 
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w  —  w,  it  follows  from  Theorem  1(a)  that: 


(LHm,2  fn)(x,y)  =  ( w 2  *  lf-i{y})(x)  =  XI  u’2(a;/)1/n  _  x')- 

x'ex 

Noting  that  our  definition  of  fn  gives  that  fn(x  —  x')  —  y  if  and  only  if  mn  =  y,  it 
follows  from  (2)  that: 


(LH w*fn){x,y)  =  X  w2(x')8y(mn)  =  8y(mn )  X  (2J+IY  =  (2J  +  If2’ 

x'£s\ipp(w) 

where  the  last  equality  follows  from  the  fact  that  |supp(w)|  =  (2 J  +  l)2.  This  then 
gives  that  (81)  becomes: 


V$(LHwocc<E,{/n}OL01)(^,  y) 


1 

(2J  +  1)2 


N—l 


X^1 

n=0 


Pn)3y(jYlri)  • 


Here,  we  indeed  see  that  as  the  support  of  the  window  increases,  our  variance  becomes 
smaller. 

We  note,  however,  that  this  is  under  the  assumption  that  the  window  lies  com¬ 
pletely  within  a  region  for  which  the  two-flat  hypothesis  holds.  Indeed,  as  the  scale  of 
the  window  grows  large,  we  expect  this  result  to  not  hold  when  applied  to  images  that 
are  composites  of  several  distinct  textures,  each  arising  from  its  own  unique  image 
model,  such  as  Figure  2(e).  In  particular,  when  the  scale  of  the  window  is  large,  the 
local  histogram  at  a  pixel  will  likely  mix  in  the  histograms  of  the  neighboring  tex¬ 
tures.  Therefore,  the  local  histogram  at  that  particular  pixel  will  not  compare  well 
against  the  histogram  of  any  one  given  texture,  causing  the  accuracy  of  our  “nearest 
histogram”  classification  scheme  to  decrease.  This  intuition  is  confirmed  by  the  de¬ 
creased  performance  of  our  classification  algorithm  as  the  scale  of  the  window  grows 
large,  see  Figure  9(b)  and  (e).  Having  seen  that  two- flatness  is  important  in  under- 
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standing  the  variance  of  the  local  histograms  of  a  composite  image,  we,  in  the  next 
section,  consider  if  two-flatness  is  a  realistic  assumption  in  terms  of  our  motivating 
application. 


5.2  Constructing  Two-Flat  Occlusion  Models 

In  this  section,  we  demonstrate  that  two-flatness  is  a  reasonable  assumption  by 
providing  a  couple  methods  for  constructing  two-flat  occlusion  models.  Recall  that 
in  a  Bernoulli  model,  we  define  the  probability  of  picking  any  particular  p  G  iflf  as 
p*(v>)  =  EL*  p<p(x)  •  Not  only  is  the  Bernoulli  random  variable  <f>  flat  as  shown  in 
Theorem  5,  but  it  is  also  two-flat: 

Theorem  12.  If  Q  is  a  Bernoulli  random  variable,  then  <f>  is  two-flat. 

Proof.  To  show  this,  first  fix  some  xk>o,  xk>>  G  X,  xy0  xk^.  Paralleling  the  proof  of 
<f>  being  flat  (Theorem  5),  it  follows  that: 

N—l  N- 1 

Y  P^((pMxK,n')lv(xk-,n")  =  Y  Pn i  •"  Pn«  K{xk'0in')lv{xkn,n"). 

”1=0  nK= o 

ip(xk)=nk,Vk 

As  the  constraints  on  the  sum  over  ip  G  uniquely  define  a  single  p,  we  have  that 
the  sum  is  one  if  n'  =  nyQ  and  n"  =  nk»  thus  giving: 


Y  F*((P)1<p(xk'0,n')lv(xk»,n") 

‘ P^N 


N—l  N- 1 

^  ^  Pn\  '  '  ‘  y  ^  PriK^n'  (nk'0)fin"(nkQ  ) 
ni=0  riK= 0 


N—l 


n.i  =0 
K0 


Y,  Pnk>J>n' (nk ')  )  (  Yj  Pnk"^n"(nK 


N- 1 


nui  =0 
K0 


N- 1 


n 

kpk'0,k "  Xnk= 0 


Pn'  Pn"  ■ 


□ 
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Having  shown  that  Bernoulli  random  variables  are  two-flat,  we  now  consider  an¬ 
other  method  for  constructing  two-flat  occlusion  models  which  uses  the  overlay  oper¬ 
ator  (31).  In  Theorem  8,  we  showed  that  if  <h,  T,  and  E  were  flat,  then  is  also 

flat.  Here,  we  show  a  similar  result  holds  when  flatness  is  replaced  with  two-flatness: 

Theorem  13.  //$,  T,  and  E  are  two-flat  (79),  then  <h#s'h  is  two-flat. 

Proof.  We  begin  by  noting  that  since  <h,  T,  and  E  are  two-flat,  they  are  also  flat. 
We  want  to  show  that  is  two-flat  (79),  meaning  that  for  any  n',n "  G  Z^+Ay, 

and  x',x''  G  X  such  that  x'  x",  our  goal  is  to  show  that  there  exists  scalars 
,n}n=oN'i~1  such  that: 


E 


—  A<3 


v£Zj 


X 


v(x')=nf 
v(x  ")=n" 


(90) 


To  see  this,  note  that  for  any  n',  n"  =  0,  •  •  •  ,  iV$  +  Ny  —  1  and  x',  x"  G  X  such  that 
x'  x" ,  it  follows  from  the  ideas  used  in  (35)  that: 


’^N^+Ny 

v(x')=n' 
v(x")=n " 


=  J2  P^MP^WPe^). 

(<P#cr4’)(x')=n' 

(.‘P#*4’)(x")=n" 


(91) 


Note  that  if  n  —  0, . . . ,  iV$  —  1,  (31)  gives  that  {Lpflafl>){x)  =  n  if  and  only  if  ip{x)  =  n 
and  a(x)  =  0.  Meanwhile,  if  n  =  iV$, . . . ,  N$  +  Ny  —  1,  (31)  gives  that  ((p#a/ip)(x)  =  n 
if  and  only  if  if(x)  =  n  —  N<p  and  a(x)  =  1.  We  now  use  these  facts  to  compute  (91) 
for  any  n',  n"  G  Z^+jv*-  In  the  case  where  n',  n"  =  0, . . . ,  iV$  —  1,  since  <h  and  E  are 
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two-flat,  (91)  becomes: 


£ 

v£Zn<s>+nx!! 

v(x')=n' 

v{x")=n" 


V)  = 


£  p»(*0  P*(V)  peM 


if(x')=n' 

<p{x")=n" 


r^Z2 


<r(V)= 0 
cr(x")= 0 


—  A$,n' A<J>,n"  Avg. 


(92) 


Meanwhile,  when  n!  —  0,  •  •  •  ,  Mj>  —  1  and  n"  =  N<s>,  •  •  •  ,  1V$  +  Ny  —  1,  since  <f>  and  T 
are  flat  and  £  is  two-flat,  we  have  that  (91)  becomes: 


y  p^e^co) 

=  £  p*(v>)  £  p*w  £  PeW 

v(x')=n ' 
v(x  ")=n" 

a€Z2 

( p(x')=n '  ip(x")=n" —N$  a(x')=0 

a{x")=l 

=  A<j,n/A^irl//_7v<i>A£)oAs 

(93) 

Similarly,  for  n'  =  N$,  •  •  •  ,  N$  +  N $  —  1  and  n"  =  0,  •  •  •  ,  iV$  —  1,  (91)  becomes: 

y.  p$#E<p(^) ; 


vgZn,s,+nxS, 
v(x')=n' 
v{x")=n " 


y  p$(<d)  y  y  Ps( 

<p(x")=n" 


(?) 


crEZ^ 


'ip(x,)=n/ — cr(x')—l 
*(x")=0 


—  A^n/'A^z-A^A^oAv  i.  (94) 

Finally,  for  n',  n"  =  N$,  •  •  •  ,  N$  +  Ny  —  1,  since  T  and  £  are  two-flat,  (91)  becomes: 


y  p$#E*(o)  =  y  p*m  y  p*wo  y  Psi 


O' 


VeZN^+N^ 

v(x')=n' 

v(x")=n" 


i>{x')=n'-N *  cr(x')=l 

ip(x")=n"-N *  °(x")= 1 


—  A^z-TV^A^n/'-A^Al  )1.  (95) 

Thus,  for  any  x',x"  G  X  such  that  x'  ^  x" ,  we  have  (92),  (93),  (94),  or  (95)  meaning 
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is  two-flat  (90),  as  claimed.  □ 

Having  demonstrated  the  potential  usefulness  of  local  histograms  in  the  analy¬ 
sis  of  images  composed  of  distinct  texture  types,  we  now  turn  to  using  them  in  a 
segmentation-and-classihcation  algorithm. 
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VI.  A  Proof-of-Concept  Classification  Algorithm 


In  this  chapter,  we  discuss  a  preliminary  segmentation-and-classification  algorithm 
inspired  by  Theorem  4  in  which  local  histograms  are  decomposed  using  principal 
component  analysis  (PCA).  In  particular,  the  work  presented  in  this  dissertation 
was  motivated  by  the  need  to  identify  and  delineate  the  various  tissues  exhibited 
in  images  of  histological  sections  of  teratoma  tumors  derived  from  embryonic  stem 
cells,  such  as  the  one  given  in  Figure  1(a).  This  image  was  provided  by  Dr.  Carlos 
Castro  of  the  University  of  Pittsburgh  and  Dr.  John  A.  Ozolek  of  the  Children’s 
Hospital  of  Pittsburgh,  who  grow  and  image  such  teratomas  to  gain  greater  insight 
into  tissue  development.  These  tumors  begin  as  masses  of  undifferentiated  cells  that 
are  implanted  in  laboratory  animals.  Over  time,  these  tumors  grow  and  their  cells 
differentiate  into  many  various  types — bone,  cartilage,  skin,  etc. — until  a  point  at 
which  they  are  excised,  sectioned,  stained  and  viewed  under  a  microscope,  resulting 
in  images  such  as  the  one  in  Figure  1(a).  As  such,  these  images  exhibit  a  wide  variety 
of  tissue  types,  arranged  in  a  seemingly  random  fashion.  Indeed,  to  a  casual  observer 
such  images  can  appear  as  a  jumbled  mess.  In  truth  however,  the  arrangement  of  these 
tissues  is  not  completely  random,  and  is  rather  the  result  of  not  yet  well-understood 
biological  mechanisms.  Drs.  Castro  and  Ozolek  believe  that  by  looking  at  many  such 
images — many  sections  of  many  teratomas — they  can  gain  greater  insight  into  these 
mechanisms. 

Up  to  this  point,  we  have  presented  a  theoretical  justification  for  using  local 
histograms  to  segment  and  label  images  like  Figure  1(a).  In  this  chapter,  we  discuss 
a  preliminary  segmentation-and-classihcation  algorithm  inspired  by  Theorem  4.  We 
emphasize  that  for  the  algorithm  presented  here,  local  histograms  are  the  only  image 
features  that  are  computed.  That  is,  the  decision  of  which  label  to  assign  to  a  given 
pixel  is  based  purely  on  the  distribution  of  color  in  its  surrounding  neighborhood.  We 
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do  this  to  demonstrate  the  validity  of  the  concept  embodied  by  Theorem  4  as  an  image 
processing  tool.  For  algorithms  intended  for  real-world  use,  such  color  information 
should  be  combined  with  morphological  data — size,  local  and  global  shape,  orientation 
and  organization — in  order  to  obtain  better  classification  accuracies.  An  example  of 
such  an  algorithm,  accompanied  by  thorough  testing  and  comparisons  against  other 
state-of-the-art  methods,  is  given  in  [3];  these  facts  are  not  reprinted  here. 

From  the  point  of  view  of  our  motivating  application,  the  significance  of  The¬ 
orems  4  and  11  is  that  they  give  credence  to  a  certain  type  of  segmentation- and- 
classihcation  algorithm.  To  be  precise,  given  a  set  of  training  images  which  are  man¬ 
ually  segmented  and  labeled  by  medical  experts,  we,  for  any  given  tissue  type,  can 
compute  local  histograms  over  regions  which  are  labeled  as  that  type.  In  light  of  The¬ 
orem  4,  it  is  reasonable  to  demix — decompose  into  convex  combinations — the  local 
histograms  of  that  type  into  a  set  of  more  basic  distributions.  Indeed,  it  is  reasonable 
to  expect  a  local  histogram  computed  over  a  region  of  cartilage  (Figure  3(b))  to  be 
a  mixture  of  0.8  of  a  “light  purple”  distribution — a  distribution  mostly  supported  in 
portions  of  y  that  correspond  to  light  purple — with  0.2  of  a  darker  reddish-purple 
one.  Meanwhile,  local  histograms  of  other  tissues  will  correspond  to  distinct  mixtures 
of  other  distributions.  For  example,  local  histograms  computed  over  a  region  of  pseu- 
dovascular  tissue  (Figure  3(d))  might  be  a  mixture  of  0.5  of  a  light  pink  distribution, 
with  0.25  of  a  dark  purple  one  and  0.25  of  a  reddish-pink  one.  Once  sparse  demixings 
of  each  tissue  type  have  been  found,  we  then  use  them  to  segment  and  classify:  given 
a  new  image,  we  assign  a  label  at  any  given  point  by  determining  which  particular 
set  of  learned  distributions  its  local  histogram  is  most  consistent  with. 

The  algorithm  we  present  in  this  chapter  exploits  this  concept.  The  first  step  is 
to  train  our  classifier.  To  do  so,  let  K  be  the  number  of  distinct  tissue  types  found 
in  a  training  image  such  as  Figure  10(a)  or  (d).  For  any  tissue  type  k  —  1, _ ,  K,  we 


82 


compute  local  histograms  {hk-m}mti  about  pixel  locations  {xk„m}m=i  that  have  been 
labeled  as  being  of  that  type  by  medical  experts.  We  note  here  that  the  computation 
of  the  variance  of  the  local  histogram  transform,  which  was  given  in  the  previous 
chapter,  demonstrated  the  importance  of  the  scale  of  the  local  histogram  window.  In 
our  algorithm,  we  use  a  Gaussian  window  and  have  chosen  the  scale  experimentally 
to  have  a  standard  deviation  of  16\/2.  Each  hk-m  is  a  nonnegatively-valued  function 
over  y  that  sums  to  one.  There  are  several  ways  to  pick  the  Xfc;m’s.  One  approach 
is  to  have  the  expert  choose  each  point  individually.  Alternatively,  if  the  expert  has 
manually  segmented  and  labeled  the  entire  image  (Figure  10(b)),  then  the  ream’s  can 
be  chosen  at  random  from  regions  of  type  k.  The  number  of  local  histograms  that 
we  compute  for  type  k  is  somewhat  arbitrary;  we  used  repeated  experimentation  to 
find  a  sample  size  large  enough  to  guarantee  reliably-decent  performance. 

In  light  of  Theorem  4,  it  would  be  nice  to  demix  the  training  local  histograms 
{hk;m}m= i  hi  terms  °f  a  type-dependent  class  of  more  basic  distributions  {gk-n}n= i- 
That  is,  we  would  like  to  End  nonnegatively-valued  functions  g^n  over  y  that  sum 
to  one  and  have  the  property  that  for  each  training  local  histogram  hk-m  there  exists 
nonnegative  scalars  {A k;m,n}n=i  that  themselves  sum  to  one  and  such  that: 

Nk 

hk;m  ~  ^  ^  A k;m,ngk,ni  A / / 1  1,  .  .  .  ,  Mfc.  (96) 

n=  1 

Unfortunately,  computing  the  g^.y s  that  minimize  the  approximation  error  in  (96)  is 
a  nontrivial  optimization  problem.  As  such,  we  leave  this  approach  for  future  work, 
and  instead  consider  a  mathematically-simpler  problem  in  which  the  \k-,m,nS  and 
g^.y s  are  permitted  to  be  arbitrary  real  scalars  and  vectors,  respectively.  That  is,  we 
perform  PGA  for  each  tissue  type  k.  PCA  is  a  technique  from  linear  algebra  that 
takes  a  set  of  correlated  local  histograms,  such  as  those  computed  at  pixels  of  the 
same  tissue  type,  and  converts  the  set  into  distributions  that  are  uncorrelated;  these 
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Figure  10.  An  example  of  using  PCA  of  local  histograms  to  perform  segmentation  and  classification 
of  the  image  given  in  (a),  which  is  a  3-bit  quantized  version  of  Figure  1(a).  A  manually  segmented  and 
labeled  version  of  (a)  is  shown  in  (b)  where  black  represents  cartilage,  light  gray  represents  connective 
tissue,  dark  gray  represents  pseudovascular  tissue,  and  white  represents  other  tissues  that  have  been 
ignored  in  this  proof-of-concept  experimentation.  That  is,  the  (ignored)  white  pixels  in  (c)  and  (e)  are 
copied  directly  from  (b),  while  the  black  and  gray  pixels  are  the  result  of  our  classification  algorithm. 
Using  (a)  as  both  the  training  and  testing  image  in  a  PCA-based  classification  scheme  (99),  we  obtain 
the  labels  shown  in  (c).  A  similar,  but  less-accurate  classification  of  (a)  can  still  be  obtained  if  we 
instead  train  on  (d),  resulting  in  the  labels  given  in  (e). 
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distributions  are  called  the  principal  components.  The  first  principal  component  has 
the  highest  possible  variance  overall  while  the  other  components  have  the  highest 
possible  variance  with  respect  to  the  restriction  that  they  must  also  be  orthogonal  to 
each  other. 

To  be  precise,  for  each  type,  we  form  a  |^|  x  Mk  matrix  H k  whose  columns  are 
the  (vectorized)  local  histograms  hk-m  less  their  average  hk: 


Hk(:,m)  —  hk.m  -  hk  where 


1  Mk 

hk  =  —  y  h. 
M,  ^ 


k;m • 


(97) 


m—  1 


We  then  compute  the  singular  value  decompositions  Hk  =  UkY>kVk  and  identify  those 
left-singular  vectors  {uk-n}n=i  that  correspond  to  some  experimentally-determined 
number  Nk  of  dominant  singular  values  {crk-n}n=i-  In  this  setting,  the  approxima¬ 
tion  (96)  is  replaced  by: 


Nk 

hk,m  ~  hk  T  ^  ^  hfc,  Ukyfi)Ukyni  Vl7l  1,  .  .  .  ,  Mk •  (98) 

n=  1 


The  classical  theory  of  PCA  states  that  the  approximation  error  in  (98)  is  opti¬ 
mally  small  in  the  sense  that  these  specific  uk-n  s  span  the  particular  AW  di  mens  ion  al 
subspace  of  iW  whose  orthogonal  projection  operator  Pk  minimizes  the  total  squared- 
error  )Cm=i  WKm  -  hk  -  Pk(hk;m  ~  K) ||2.  The  vectors  hk  and  {Mfc;n}^1  in  hand,  we 
store  them  in  memory,  completing  the  training  phase  of  our  classification  algorithm. 

To  segment  and  label  a  given  image  /,  we  compute  its  local  histograms  (1),  ob¬ 
taining  local  distributions  of  color  hx  :  y  >  M,  hx(y )  =  (LH wf)(x,y)  about  every 
pixel  location  x  G  X .  At  any  given  x ,  we  then  assign  a  tissue  label  k(x)  by  hireling  the 
tissue  type  k  whose  shifted  subspace  hk  +  span{rtfc;n}^:1  is  nearest  to  hx.  Specifically, 
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we  let: 


Nk 


k(x)  =  argmin  hx  -  hk  -  }{hx  -  hk,  uk.n)uk.r 

k=l,...,K 


n= 1 


Nk 


=  argmin  ( \\hx  -  hk ||2  -  \(hx  —  hk,  uk.n) 

k=l,...,K  v 


n=  1 


argmin  IE  I  My)  -  Mi/) I2  -  EE  (My)  -  hk(y))uk.n(y ) 

■yey  n= i  j/ey 


(99) 


In  implementation,  we  compute  the  summations  over  (V  hr  (99)  as  running  sums, 
looping  over  all  y  G  y.  This  computational  trick  greatly  reduces  our  memory  re¬ 
quirements:  at  any  given  time,  we  only  store  a  single  level  of  LHW/.  By  Theorem  1, 
such  a  level  can  be  obtained  by  filtering  an  indicator  function;  in  the  following  exper¬ 
imental  results,  we  avoided  edge  artifacts  by  using  a  weighted  noncyclic  method  of 
filtering,  namely  the  ★-convolution  of  [33].  Without  such  a  trick,  one  must  store  the 
entire  local  histogram  transform  in  memory,  a  daunting  task  for  even  modestly-sized 
images:  the  full  local  histogram  transform  of  the  1200  x  1200,  8-bit  RGB  image  given 
in  Figure  1(a)  is  a  1200  x  1200  x  256  x  256  x  256  array. 

Further  computational  advantages  may  be  gained  by  quantizing  the  image  and 
reducing  the  dimension  of  the  color  space.  For  our  particular  set  of  histology  images, 
we  experimentally  found  that  we  could  still  obtain  good  accuracies  even  if  we  discard 
the  green  channel  of  our  purple-pink  images,  and  moreover  quantize  the  8-bit  red  and 
blue  channels  down  to  3-bits  apiece.  That  is,  we  quantize  y  from  Z|56  to  Z|.  By 
Proposition  2,  this  is  equivalent  to  binning  the  original  1200  x  1200  x  256  x  256  x  256 
local  histogram  array  down  to  a  new  one  of  size  1200  x  1200  x  8  x  8.  The  quantized 
version  of  Figure  1(a)  is  given  in  Figure  10(a);  for  the  sake  of  readability,  a  3-bit 
quanitized  version  of  the  unused  green  channel  was  included  in  this  rendering.  As 
a  result  of  this  quantization,  it  only  takes  a  few  seconds  to  assign  per-pixcl  labels 
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to  a  1200  x  1200  histology  image  using  a  MATLAB-based  implementation  of  (99), 
running  on  standard  desktop  hardware.  For  this  particular  set  of  images,  further  color 
quantization,  such  as  using  2-bit  colors  (3^  =  Z|)  or  converting  the  original  image  to 
grayscale  (3?  =  Z256),  results  in  an  unacceptable  loss  in  classification  accuracy,  as  do 
attempts  at  spatial  quantization  (A  =  Zg00). 

Two  runs  of  this  classification  algorithm  are  depicted  in  Figure  10.  In  the  first  run, 
we  train  the  classifier  on  the  3-bit  1200  x  1200  red-blue  image  given  in  Figure  10(a). 
For  the  sake  of  simplicity,  we  restrict  ourselves  to  K  =  3  tissue  types:  cartilage, 
connective  tissue  and  pseudovascnlar  tissue;  all  other  tissue  types  are  ignored  in 
the  confusion  matrices  given  below.  For  each  type  k  =  1,2,3,  we  randomly  choose 
Mk  =  64  points  of  that  type,  making  use  of  a  small  number  of  the  12002  ground 
truth  labels  given  in  Figure  10(b);  edge  artifacts  are  avoided  by  not  picking  points 
near  the  border.  For  each  type,  we  then  perform  PCA  on  the  64  local  histograms 
hk-m  of  that  type,  computing  an  average  local  histogram  hk  as  well  as  the  dominant 
left-singular  vectors  of  Hk  (97).  For  the  sake  of  simplicity,  in  a  given  experiment  we 
will  use  the  same  number  of  principal  components  for  each  of  the  three  types,  that 
is,  Nk  =  N  for  k  =  1,  2,  3.  At  the  same  time,  we  experiment  with  this  number  itself, 
letting  N  be  either  1,  2,  3  or  4.  With  the  training  complete,  we  then  segment  and 
classify  Figure  10(a)  using  the  decision  rule  (99),  resulting  in  per-pixcl  labels  such  as 
the  ones  given  in  Figure  10(c)  for  N  =  4.  Comparing  Figure  10(c)  and  the  ground 
truth  of  Figure  10(b),  we  see  both  the  power  and  limitations  of  local  histograms: 
color  is  a  big  factor  in  determining  tissue  type,  but  by  ignoring  shape,  we  suffer  from 
oversmoothing.  The  accuracy  percentages  for  various  choices  of  N  are  given  by  a 
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confusion  matrix: 


N  =  1  N  =  2  N  =  3  N  =  4 

Ca  Co  Ps  Ca  Co  Ps  Ca  Co  Ps  Ca  Co  Ps 
Ca  77  22  1  87  11  2  96  3  1  96  3  1 

Co  0  95  5  0  91  9  3  94  3  3  92  5 

Ps  2  11  87  2  8  90  6  7  87  5  5  90 

Here  each  row  of  the  matrix  tells  us  the  percentage  a  certain  tissue  was  labeled  as 
cartilage  (Ca),  connective  tissue  (Co),  and  pseudovascular  tissue  (Ps).  In  particular, 
the  first  three  entries  of  the  first  row  of  this  table  tell  us  that  when  using  a  single  prin¬ 
cipal  component,  those  points  labeled  as  cartilage  by  a  medical  expert  in  Figure  10(b) 
are  correctly  labeled  as  such  by  our  algorithm  77%  of  the  time,  while  22%  of  it  is 
mislabeled  as  connective  tissue  and  1%  of  it  is  mislabeled  as  pseudovascular  tissue. 
Note  here  that  we  have  trained  and  tested  on  the  same  image;  such  experiments 
indicate  the  feasibility  of  our  approach  in  a  semi-automated  classification  scheme  in 
which  a  medical  expert  handpicks  64  points  of  each  given  type  and  lets  the  algorithm 
automatically  assign  labels  to  the  rest. 

The  second  run  of  this  algorithm  is  almost  identical  to  the  first,  with  the  exception 
that  we  use  a  distinct  image  in  the  training  phase.  To  be  precise,  for  each  of  the  three 
tissue  types,  we  perform  PCA  on  the  local  histograms  of  64  randomly- chosen  points 
of  that  type  in  Figure  10(d),  making  use  of  its  ground  truth  labels  (not  pictured).  We 
then  apply  the  principal  components  obtained  from  Figure  10(d)  to  generate  labels 
(Figure  10(e))  for  Figure  10(a)  using  the  decision  rule  (99).  Compared  to  the  first 
run,  the  algorithm’s  performance  here  is  a  better  indication  of  its  feasibility  as  a 
fully  automated  classification  scheme,  and  is  summarized  by  the  following  confusion 


matrix: 


N  =  1  N  =  2  N  =  3  N  =  4 


Ca 

Co 

Ps 
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Ps 

Ca 

Co 

Ps 

Ca 
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Ps 

Ca 

90 

9 

1 
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90 
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11 

6 

Co 

25 

61 

14 

10 

62 

28 

7 

79 

14 

8 

70 

22 

Ps 

30 

12 

58 

4 

10 

86 

4 

50 

46 

2 

17 
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Though  the  performance  in  the  second  run  is  understandably  worse  than  that  of  the 
first,  it  nevertheless  demonstrates  the  real-world  potential  of  the  idea  exemplified  by 
Theorem  4:  the  local  histograms  of  certain  types  of  textures  can  be  decomposed  into 
more  basic  distributions,  and  this  decomposition  can  serve  as  an  image  processing 


VII.  Conclusions 


In  this  dissertation,  we  provided  a  rigorous  mathematical  analysis  of  local  his¬ 
tograms  to  be  used  in  the  segmentation  and  classification  of  histology  images.  In 
particular,  we  found  a  nice  relationship  between  local  histograms  and  images  that 
are  composites  of  several  distinct  textures:  local  histograms  distribute,  on  average, 
over  flat  occlusion  models.  Additionally,  we  showed  that  the  distance  of  our  local 
histograms  from  the  average  is  highly  dependent  on  the  scale  of  the  local  histogram 
window.  We  also  characterized  all  transforms  which  satisfy  the  three  key  properties 
of  local  histograms  thus  showing  that  local  histograms  are  essentially  the  only  trans¬ 
forms  that  distribute  over  random  composites,  on  average.  These  results  demonstrate 
the  usefulness  of  local  histograms  in  analyzing  textures  commonly  found  in  histology 
images  and  give  credence  to  certain  types  of  segmentation-and-classihcation  algo¬ 
rithms.  We  demonstrated  the  real-world  potential  of  these  ideas  by  using  them  to 
segment  and  classify  tissues  in  a  histology  image. 

We  now  discuss  several  ideas  for  future  work.  The  first  idea  arises  from  the  fact 
that  storing  local  histograms  of  an  image  requires  a  significant  amount  of  memory. 
Suppose  we  want  to  compare  the  local  histogram  against  a  library  of  ground  truth 
histograms  and  assign  a  label  based  on  the  closest  match  in  the  library.  Not  only 
does  computing  the  local  histograms  require  a  significant  amount  of  memory,  but 
the  size  of  our  library  is  limited  because  of  memory.  In  order  to  reduce  this  cost, 
one  could  study  sparse  representations  of  local  histograms,  that  is,  see  when  they 
can  be  written  as  combinations  of  a  few  essential  fundamental  distributions.  One 
could  also  investigate  how  classification  accuracy  suffers,  if  at  all,  as  a  result  of  such 
sparse  approximation.  Second,  we  note  that  up  to  this  point,  we  have  focused  on 
local  histograms  in  the  case  where  the  images  in  question  are  discrete.  By  extending 
this  theory  to  the  continuous  case,  one  could  apply  local  histograms  not  just  to  the 
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pixel  values  themselves,  but  also  to  real-valued  quantities  derived  from  them,  such  as 
image  orientations.  In  particular,  one  could  study  how  the  local  histogram  transform 
interacts  with  multiscale  moment  transforms  and  incorporate  orientation  information 
into  the  segmentation-and-classihcation  algorithm.  Such  geometrical  context  is  vital 
when  color  alone  is  insufficient  for  proper  classification.  One  could  also  exploit  the 
theory  of  frames  and  filter  banks  to  generalize  the  local  histogram  transform  in  a  way 
that  allows  it  to  incorporate  both  high  and  low  spatial  frequency  information.  Next, 
one  could  use  information  theory  metrics  to  compare  the  information  content  of  the 
RGB  and  RB  images.  One  could  study  how  the  distinct  red-blue  pairs  are  related  to 
the  possible  green  values  and  study  when  the  green  channel  may  be  disregarded  in  the 
classification  scheme.  Finally,  one  could  also  apply  this  algorithm  to  a  broader  class 
of  images,  such  as  aerial  and  satellite  images.  Indeed,  being  able  to  automatically 
classify  the  terrain  in  such  images  has  applications  in  image  registration  and  visual 
navigation  systems. 
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